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(S) Linear prediction coefficient generation during frame erasure or packet loss. 



(57) A speech coding system robust to frame erasure (or packet loss) is described. Illustrative embodi- 
ments are directed to a modified version of CCITT standard G.728. In the event of frame erasure, vectors 
of an excitation signal are synthesized based on previously stored excitation signal vectors generated 
during non-erased frames. This synthesis differs for voiced and non-voiced speech. During erased 
frames, linear prediction filter coefficients are synthesized as a weighted extrapolation of a set of linear 
prediction filter coefficients determined during non-erased frames. The weighting factor is a number 
less than 1. This weighting accomplishes a bandwidth-expansion of peaks in the frequency response of 
a linear predictive filter. Computational complexity during erased frames is reduced through the 
elimination of certain computations needed during non-erased frames only. This reduction in compu- 
tational complexity offsets additional computation required for excitation signal synthesis and linear 
prediction filter coefficient generation during erased frames. 
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Field of the Invention 

The present invention relates generally to speech coding arrangements for use in wireless communication 
systems, and more particularly to the ways in which such speech coders function in the event of burst-like 
5 errors in wireless transmission. 

Background of the Invention 

Many communication systems, such as cellular telephone and personal communications systems, rely on 
10 wireless channels to communicate information. In the course of communicating such information, wireless com- 
munication channels can suffer from several sources of error, such as multipath fading. These error sources 
can cause, among other things, the problem of frame erasure. An erasure refers to the total loss or substantial 
corruption of a set of bits communicated to a receiver. A frame is a predetermined fixed number of bits. 

If a frame of bits is totally lost, then the receiver has no bits to interpret. Under such circumstances, the 
15 receiver may produce a meaningless result. If a frame of received bits is corrupted and therefore unreliable, 
the receiver may produce a severely distorted result. 

As the demand for wireless system capacity has increased, a need has arisen to make the best use of 
available wireless system bandwidth. One way to enhance the efficient use of system bandwidth is to employ 
a signal compression technique. For wireless systems which carry speech signals, speech compression (or 
20 speech coding) techniques may be employed for this purpose. Such speech coding techniques include analy- 
sis-by-synthesis speech coders, such as the well-known code-excited linear prediction (or CELP) speech cod- 
er. 

The problem of packet loss in packet- switched networks employing speech coding arrangements is very 
similar to frame erasure in the wireless context That is, due to packet loss, a speech decoder may either fail 

25 to receive a frame or receive a frame having a significant number of missing bits. In either case, the speech 
decoder is presented with the same essential problem - the need to synthesize speech despite the loss of 
compressed speech information. Both "frame erasure" and "packet loss" concern a communication channel 
(or network) problem which causes the loss of transmitted bits. For purposes of this description, therefore, 
the term "frame erasure" may be deemed synonymous with packet loss. 

30 CELP speech coders employ a codebook of excitation signals to encode an original speech signal. These 

excitation signals are used to "excite" a linear predictive (LPC) filter which synthesizes a speech signal (or some 
precursor to a speech signal) in response to the excitation. The synthesized speech signal is compared to the 
signal to be coded. The codebook excitation signal which most closely matches the original signal is identified. 
The identified excitation signal's codebook index is then communicated to a CELP decoder (depending upon 

35 the type of CELP system, other types of information may be communicated as well). The decoder contains a 
codebook identical to that of the CELP coder. The decoder uses the transmitted index to select an excitation 
signal from its own codebook. This selected excitation signal is used to excite the decoder's LPC filter. Thus 
excited, the LPC filter of the decoder generates a decoded (or quantized) speech signal — the same speech 
signal which was previously determined to be closest to the original speech signal. 

40 Wireless and other systems which employ speech coders may be more sensitive to the problem of frame 

erasure than those systems which do not compress speech. This sensitivity is due to the reduced redundancy 
of coded speech (compared to uncoded speech) making the possible loss of each communicated bit more sig- 
nificant In the context of a CELP speech coders experiencing frame erasure, excitation signal codebook in- 
dices may be either lost or substantially corrupted. Because of the erased frame(s), the CELP decoder will 

45 not be able to reliably identify which entry in its codebook should be used to synthesize speech. As a result, 
speech coding system performance may degrade significantly. 

As a result of lost excitation signal codebook indicies, normal techniques for synthesizing an excitation sig- 
nal in a decoder are ineffective. These techniques must therefore be replaced by alternative measures. A fur- 
ther result of the loss of codebook indices is that the normal signals available for use in generating linear pre- 

so diction coefficients are unavailable. Therefore, an alternative technique for generating such coefficients is 
needed. 

Summary of the Invention 

55 The present invention generates linear prediction coefficient signals during frame erasure based on a 

' weighted extrapolation of linear prediction coefficient signals generated during a non-erased frame. This 
weighted extrapolation accomplishes an expansion of the bandwidth of peaks in the frequency response of a 
linear prediction filter. 
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Illustratively, linear prediction coefficient signals generated during a non-erased frame are stored in a buf- 
fer memory. When a frame erasure occurs, the last "good" set of coefficient signals are weighted by a band- 
width expansion factor raised to an exponent. The exponent is the index identifying the coefficient of interest. 
The factor is a number in the range of 0.95 to 0.99. 

5 

Brief Description of the Drawings 

Figure 1 presents a block diagram of a G.728 decoder modified in accordance with the present invention. 
Figure 2 presents a block diagram of an illustrative excitation synthesizer of Figure 1 in accordance with 
10 the present invention. 

Figure 3 presents a block-flow diagram of the synthesis mode operation of an excitation synthesis proc- 
essor of Figure 2. 

Figure 4 presents a block-flow diagram of an alternative synthesis mode operation of the excitation syn- 
thesis processor of Figure 2. 

15 Figure 5 presents a block-flow diagram of the LPC parameter bandwidth expansion performed by the band- 

width expander of Figure 1. 

Figure 6 presents a block diagram of the signal processing performed by the synthesis filter adapter of 
Figure 1. 

Figure 7 presents a block diagram of the signal processing performed by the vector gain adapter of Figure 

20 1. 

Figures 8 and 9 present a modified version of an LPC synthesis filter adapter and vector gain adapter, 
respectively, for G.728. 

Figures 10 and 11 present an LPC filter frequency response and a bandwidth-expanded version of same, 
respectively. 

25 Figure 12 presents an illustrative wireless communication system in accordance with the present inven- 

tion. 

Detailed Description 
30 I. Introduction 

The present invention concerns the operation of a speech coding system experiencing frame erasure — 
that is, the loss of a group of consecutive bits in the compressed bit-stream which group is ordinarily used to 
synthesize speech. The description which follows concerns features of the present invention applied illustra- 

35 tively to the well-known 16 kbit/s low-delay CELP (LD-CELP) speech coding system adopted by the CCITT as 
its international standard G.728 (for the convenience of the reader, the draft recommendation which was adopt- 
ed as the G.728 standard is attached hereto as an Appendix; the draft will be referred to herein as the "G.728 
standard draft"). This description notwithstanding, those of ordinary skill in the art will appreciate that features 
of the present invention have applicability to other speech coding systems. 

40 The G.728 standard draft includes detailed descriptions of the speech encoder and decoder of the standard 

(See G.728 standard draft, sections 3 and 4). The first illustrative embodiment concerns modifications to the 
decoder of the standard. While no modifications to the encoder are required to implement the present invention, 
the present invention may be augmented by encoder modifications. In fact, one illustrative speech coding sys- 
tem described below includes a modified encoder. 

45 Knowledge of the erasure of one or more frames is an input to the illustrative embodiment of the present 

invention. Such knowledge may be obtained in any of the conventional ways well known in the art. For example, 
frame erasures may be detected through the use of a conventional error detection code. Such a code would 
be implemented as part of a conventional radio transmission/reception subsystem of a wireless communication 
system. 

50 For purposes of this description, the output signal of the decoder's LPC synthesis filter, whether in the 

speech domain or in a domain which is a precursor to the speech domain, will be referred to as the "speech 
signal." Also, for clarity of presentation, an illustrative frame will be an integral multiple of the length of an adap- 
tation cycle of the G.728 standard. This illustrative frame length is, in fact, reasonable and allows presentation 
of the invention without loss of generality. It may be assumed, for example, that a frame is 10 ms in duration 

55 or four times the length of a G.728 adaptation cycle. The adaptation cycle is 20 samples and corresponds to 
a duration of 2.5 ms. 

For clarity of explanation, the illustrative embodiment of the present invention is presented as comprising 
individual functional blocks. The functions these blocks represent may be provided through the use of either 
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shared or dedicated hardware, including, but not limited to, hardware capable of executing software. For ex- 
ample, the blocks presented in Figures 1, 2, 6, and 7 may be provided by a single shared processor. (Use of 
. the term "processor" should not be construed to refer exclusively to hardware capable of executing software.) 
Illustrative embodiments may comprise digital signal processor (DSP) hardware, such as the AT&T DSP16 
5 or DSP32C, read-only memory (ROM) for storing software performing the operations discussed below, and 
random access memory (RAM) for storing DSP results. Very large scale integration (VLSI) hardware embodi- 
ments, as well as custom VLSI circuitry in combination with a general purpose DSP circuit, may also be pro- 
vided. *! 

10 II. An Illustrative Embodiment 

Figure 1 presents a block diagram of a G.728 LD-CELP decoder modified in accordance with the present 
invention (Figure 1 is a modified version of figure 3 of the G.728 standard draft). In normal operation (i.e., with- 
out experiencing frame erasure) the decoder operates in accordance with G.728. It first receives codebook 

15 indices, i, from a communication channel. Each index represents a vector of five excitation signal samples 
which may be obtained from excitation VQ codebook 29. Codebook 29 comprises gain and shape codebooks 
as described in the G.728 standard draft. Codebook 29 uses each received index to extract an excitation co- 
devector. The extracted codevector is that which was determined by the encoder to be the best match with 
the original signal. Each extracted excitation codevector is scaled by gain amplifier 31. Amplifier 31 multiplies 

20 each sample of the excitation vector by a gain determined by vector gain adapter 300 (the operation of vector 
gain adapter 300 is discussed below). Each scaled excitation vector, ET, is provided as an input to an excitation 
synthesizer 100. When no frame erasures occur, synthesizer 100 simply outputs the scaled excitation vectors 
without change. Each scaled excitation vector is then provided as input to an LPC synthesis filter 32. The LPC 
synthesis filter 32 uses LPC coefficients provided by a synthesis filter adapter 330 through switch 120 (switch 

25 120 is configured according to the "dashed' line when no frame erasure occurs; the operation of synthesis 
filter adapter 330, switch 120, and bandwidth expander 115 are discussed below). Filter 32 generates decoded 
(or "quantized") speech. Filter 32 is a 50th order synthesis filter capable of introducing periodicity in the de- 
coded speech signal (such periodicity enhancement generally requires a filter of order greater than 20). In ac- 
cordance with the G.728 standard, this decoded speech is then postf iltered by operation of postf ilter 34 and 

30 postf ilter adapter 35. Once postf iltered, the format of the decoded speech is converted to an appropriate stan- 
dard format by format converter 28. This format conversion facilitates subsequent use of the decoded speech 
by other systems. 

A. Excitation Signal Synthesis During Frame Erasure 

35 

In the presence of frame erasures, the decoder of Figure 1 does not receive reliable information (if it re- 
ceives anything at all) concerning which vector of excitation signal samples should be extracted from codebook 
29. In this case, the decoder must obtain a substitute excitation signal for use in synthesizing a speech signal. 
The generation of a substitute excitation signal during periods of frame erasure is accomplished by excitation 
4Q synthesizer 100. 

Figure 2 presents a block diagram of an illustrative excitation synthesizer 1 00 in accordance with the pres- 
ent invention. During frame erasures, excitation synthesizer 100 generates one or more vectors of excitation 
signal samples based on previously determined excitation signal samples. These previously determined exci- 
tation signal samples were extracted with use of previously received codebook indices received from the com- 
45 munication channel. As shown in Figure 2, excitation synthesizer 100 includes tandem switches 110, 130 and 
excitation synthesis processor 120. Switches 110, 130 respond to a frame erasure signal to switch the mode 
of the synthesizer 100 between normal mode (no frame erasure) and synthesis mode (frame erasure). The 
frame erasure signal is a binary flag which indicates whether the current frame is normal (e.g., a value of "0") 
or erased (e.g., a value of "1"). This binary flag is refreshed for each frame. 

50 

1. Normal Mode 

• , 

In normal mode (shown by the dashed lines in switches 110 and 130), synthesizer 100 receives gain-scaled 
excitation vectors, ET (each of which comprises five excitation sample values), and passes those vectors to 
55 its output Vector sample values are also passed to excitation synthesis processor 120. Processor 120 stores 
these sample values in a buffer, ETPAST, for subsequent use in the event of frame erasure. ETPAST holds 
200 of the most recent excitation signal sample values (i.e., 40 vectors) to provide a history of recently received 
(or synthesized) excitation signal values. When ETPAST is full, each successive vector of five samples pushed 
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into the buffer causes the oldest vector of five samples to fall out of the buffer. (As will be discussed below 
with reference to the synthesis mode, the history of vectors may include those vectors generated in the event 
of frame erasure.) 

5 2. Synthesis Mode 

In synthesis mode (shown by the solid lines in switches 110 and 130), synthesizer 100 decouples the gain- 
scaled excitation vector input and couples the excitation synthesis prbcessor 120 to the synthesizer output. 
Processor 120, in response to the frame erasure signal, operates to synthesize excitation signal vectors. 

10 Figure 3 presents a block-flow diagram of the operation of processor 1 20 in synthesis mode. At the outset 

of processing, processor 120 determines whether erased f rame(s) are likely to have contained voiced speech 
(see step 1201). This may be done by conventional voiced speech detection on past speech samples. In the 
context of the G.728 decoder, a signal PTAP is available (from the postf ilter) which may be used in a voiced 
speech decision process. PTAP represents the optimal weight of a single-tap pitch predictor for the decoded 

15 speech. If PTAP is large (e.g., close to 1), then the erased speech is likely to have been voiced. If PTAP is 
small (e.g., close to 0), then the erased speech is likely to have been non- voiced (i.e., unvoiced speech, si- 
lence, noise). An empirically determined threshold, VTH, is used to make a decision between voiced and non- 
voiced speech. This threshold is equal to 0.6/1.4 (where 0.6 is a voicing threshold used by the G.728 postf ilter 
and 1 .4 is an experimentally determined number which reduces the threshold so as to err on the side on voiced 

20 speech). 

If the erased frame(s) is determined to have contained voiced speech, a new gain-scaled excitation vector 
ET is synthesized by locating a vector of samples within buffer ETPAST, the earliest of which is KP samples 
in the past (see step 1204). KP is a sample count corresponding to one pitch-period of voiced speech. KP may 
be determined conventionally from decoded speech; however, the postf ilter of the G.728 decoder has this val- 

25 ue already computed. Thus, the synthesis of a new vector, ET, comprises an extrapolation (e.g., copying) of 
a set of 5 consecutive samples into the present. Buffer ETPAST is updated to reflect the latest synthesized 
vector of sample values, ET (see step 1206). This process is repeated until a good (non-erased) frame is re- 
ceived (see steps 1208 and 1209). The process of steps 1204, 1206, 1208 and 1209 amount to a periodic rep- 
etition of the last KP samples of ETPAST and produce a periodic sequence of ET vectors in the erased frame(s) 

30 (where KP is the period). When a good (non-erased) frame is received, the process ends. 

If the erased frame(s) is determined to have contained non-voiced speech (by step 1201), then a different 
synthesis procedure is implemented. An illustrative synthesis of ET vectors is based on a randomized extrap- 
olation of groups of five samples in ETPAST. This randomized extrapolation procedure begins with the com- 
putation of an average magnitude of the most recent 40 samples of ETPAST (see step 1210). This average 

35 magnitude is designated as AVMAG. AVMAG is used in a process which insures that extrapolated ET vector 
samples have the same average magnitude as the most recent 40 samples of ETPAST. 

A random integer number, NUMR, is generated to introduce a measure of randomness into the excitation 
synthesis process. This randomness is important because the erased frame contained unvoiced speech (as 
determined by step 1201). NUMR may take on any integer value between 5 and 40, inclusive (see step 1212). 

40 Five consecutive samples of ETPAST are then selected, the oldest of which is NUMR samples in the past (see 
step 1214). The average magnitude of these selected samples is then computed (see step 1216). This average 
magnitude is termed VECAV. A scale factor, SF, is computed as the ratio of AVMAG to VECAV (see step 121 8). 
Each sample selected from ETPAST is then multiplied by SF. The scaled samples are then used as the syn- 
thesized samples of ET (see step 1220). These synthesized samples are also used to update ETPAST as de- 

45 scribed above (see step 1222). 

If more synthesized samples are needed to fill an erased frame (see step 1224), steps 1212-1222 are re- 
peated until the erased frame has been filled, rf a consecutive subsequent frame(s) is also erased (see step 
1226), steps 1210-1224 are repeated to fill the subsequent erased frame(s). When all consecutive erased 
frames are filled with synthesized ET vectors, the process ends. 

so 

3. Alternative Synthesis Mode for Non-voiced Speech 

Figure 4 presents a block-flow diagram of an alternative operation of processor 1 20 in excitation synthesis 
mode. In this alternative, processing for voiced speech is identical to that described above with reference to 
55 Figure 3. The difference between alternatives is found in the synthesis of ET vectors for non-voiced speech. 
Because of this, only that processing associated with non-voiced speech is presented in Figure 4. 

As shown in the Figure, synthesis of ET vectors for non-voiced speech begins with the computation of cor- 
relations between the most recent block of 30 samples stored in buffer ETPAST and every other block of 30 
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samples of ETPAST which lags the most recent block by between 31 and 170 samples (see step 1230). For 
example, the most recent 30 samples of ETPAST is first correlated with a block of samples between ETPAST 
samples 32-61, inclusive. Next, the most recent block of 30 samples is correlated with samples of ETPAST 
between 33-62, inclusive, and so on. The process continues for all blocks of 30 samples up to the block con- 

5 taining samples between 171-200, inclusive 

For all computed correlation values greater than a threshold value, THC, a time lag (MAX!) corresponding 
to the maximum correlation is determined (see step 1232). 

Next, tests are made to determine whether the erased frame likely exhibited very low periodicity. Under 
circumstances of such low periodicity, it is advantageous to avoid the introduction of artificial periodicity into 

10 the ET vector synthesis process. This is accomplished by varying the value of time lag MAXL If either (!) PTAP 
is less than a threshold, VTH1 (see step 1234), or (//) the maximum correlation corresponding to MAXI is less 
than a constant, MAXC (see step 1236), then very low periodicity is found. As a result, MAXI is incremented 
by 1 (see step 1238). If neither of conditions (/) and (//) are satisfied, MAXI is not incremented. Illustrative values 
for VTH1 and MAXC are 0.3 and 3x10 7 , respectively. 

15 MAXI is then used as an index to extract a vector of samples from ETPAST. The earliest of the extracted 

samples are MAXI samples in the past. These extracted samples serve as the next ET vector (see step 1240). 
As before, buffer ETPAST is updated with the newest ET vector samples (see step 1242). 

If additional samples are needed to fill the erased frame (see step 1244), then steps 1234-1242 are re- 
peated. After all samples in the erased frame have been filled, samples in each subsequent erased frame are 

20 filled (see step 1246) by repeating steps 1230-1244. When all consecutive erased frames are filled with syn- 
thesized ET vectors, the process ends. 

B. LPC Filter Coefficients for Erased Frames 

25 In addition to the synthesis of gain-scaled excitation vectors, ET, LPC filter coefficients must be generated 

during erased frames. In accordance with the present invention, LPC filter coefficients for erased frames are 
generated through a bandwidth expansion procedure. This bandwidth expansion procedure helps account for 
uncertainty in the LPC filter frequency response in erased frames. Bandwidth expansion softens the sharp- 
ness of peaks in the LPC filter frequency response. 

30 Figure 10 presents an illustrative LPC filter frequency response based on LPC coefficients determined 

for a non-erased frame. As can be seen, the response contains certain "peaks." It is the proper location of these 
peaks during frame erasure which is a matter of some uncertainty. For example, correct frequency response 
for a consecutive frame might look like that response of Figure 10 with the peaks shifted to the right or to the 
left. During frame erasure, since decoded speech is not available to determine LPC coefficients, these coef- 

35 f icients (and hence the filter frequency response) must be estimated. Such an estimation may be accomplish- 
ed through bandwidth expansion. The result of an illustrative bandwidth expansion is shown in Figure 11. As 
may be seen from Figure 11, the peaks of the frequency response are attenuated resulting in an expanded 
3db bandwidth of the peaks. Such attenuation helps account for shifts in a "correct" frequency response which 
cannot be determined because of frame erasure. 

40 According to the G.728 standard, LPC coefficients are updated at the third vector of each four-vector adap- 

tation cycle. The presence of erased frames need not disturb this timing. As with conventional G.728, new LPC 
coefficients are computed at the third vector ET during a frame. In this case, however, the ET vectors are syn- 
thesized during an erased frame. 

As shown in Figure 1, the embodiment includes a switch 120, a buffer 110, and a bandwidth expander 115. 

45 During normal operation switch 120 is in the position indicated by the dashed line. This means that the LPC 
coefficients, a it are provided to the LPC synthesis filter by the synthesis filter adapter 33. Each set of newly 
adapted coefficients, ai, is stored in buffer 110 (each new set overwriting the previously saved set of coeffi- 
cients). Advantageously, bandwidth expander 115 need not operate in normal mode (if it does, its output goes 
unused since switch 120 is in the dashed position). 

50 Upon the occurrence of a frame erasure, switch 120 changes state (as shown in the solid line position). 

Buffer 110 contains the last set of LPC coefficients as computed with speech signal samples from the last good 
frame. At the third vector of the erased frame, the bandwidth expander 115 computes new coefficients, aj. 

Figure 5 is a block-flow diagram of the processing performed by the bandwidth expander 115 to generate 
new LPC coefficients. As shown in the Figure, expander 115 extracts the previously saved LPC coefficients 

,5 from buffer 110 (see step 11 51). New coefficients aj are generated in accordance with expression (1): 

a; = (BEF) ! a lf 1^i^50, (1) 
where BEF is a bandwidth expansion factor illustratively takes on a value in the range 0.95-0.99 and is advan- 
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tageously set to 0.97 or 0.98 (see step 1153). These newly computed coefficients are then output (see step 
1155). Note that coefficients aj are computed only once for each erased frame. 

The newly computed coefficients are used by the LPC synthesis filter 32 for the entire erased frame. The 
LPC synthesis filter uses the new coefficients as though they were computed under normal circumstances by 
adapter 33. The newly computed LPC coefficients are also stored in buffer 110. as shown in Figure 1. Should 
there be consecutive frame erasures, the newly computed LPC coefficients stored in the buffer 110 would be 
used as the basis for another iteration of bandwidth expansion according to the process presented in Figure 
5. Thus, the greater the number of consecutive erased frames, the greater the applied bandwidth expansion 
(i.e., for the kth erased frame of a sequence of erased frames, the effective bandwidth expansion factor is 
BEF k ). 

Other techniques for generating LPC coefficients during erased frames could be employed instead of the 
bandwidth expansion technique described above. These include (/) the repeated use of the last set of LPC coef- 
ficients from the last good frame and (//) use of the synthesized excitation signal in the conventional G.728 
LPC adapter 33. 

C. Operation of Backward Adapters During Frame Erased Frames 



The decoder of the G.728 standard includes a synthesis filter adapter and a vector gain adapter (blocks 
33 and 30, respectively, of figure 3, as well as figures 5 and 6, respectively, of the G.728 standard draft). Under 

20 normal operation (i.e., operation in the absence of frame erasure), these adapters dynamically vary certain 
parameter values based on signals present in the decoder. The decoder of the illustrative embodiment also 
includes a synthesis filter adapter 330 and a vector gain adapter 300. When no frame erasure occurs, the syn- 
thesis filter adapter 330 and the vector gain adapter 300 operate in accordance with the G.728 standard. The 
operation of adapters 330, 300 differ from the corresponding adapters 33, 30 of G.728 only during erased 

25 frames. 

As discussed above, neither the update to LPC coefficients by adapter 330 nor the update to gain predictor 
parameters by adapter 300 is needed during the occurrence of erased frames. In the case of the LPC coeffi- 
cients, this is because such coefficients are generated through a bandwidth expansion procedure. In the case 
of the gain predictor parameters, this is because excitation synthesis is performed in the gain-scaled domain. 
Because the outputs of blocks 330 and 300 are not needed during erased frames, signal processing operations 
performed by these blocks 330, 300 may be modified to reduce computational complexity. 

As may be seen in Figures 6 and 7, respectively, the adapters 330 and 300 each include several signal 
processing steps indicated by blocks (blocks 49-51 in figure 6; blocks 39-48 and 67 in figure 7). These blocks 
are generally the same as those defined by the G.728 standard draft. In the first good frame following one or 
more erased frames, both blocks 330 and 300 form output signals based on signals they stored in memory 
during an erased frame. Prior to storage, these signals were generated by the adapters based on an excitation 
signal synthesized during an erased frame. In the case of the synthesis filter adapter 330, the excitation signal 
is first synthesized into quantized speech prior to use by the adapter. In the case of vector gain adapter 300, 
the excitation signal is used directly. In either case, both adapters need to generate signals during an erased 
frame so that when the next good frame occurs, adapter output may be determined. 

Advantageously, a reduced number of signal processing operations normally performed by the adapters 
of Figures 6 and 7 may be performed during erased frames. The operations which are performed are those 
which are either (/) needed for the formation and storage of signals used in forming adapter output in a sub- 
sequent good (i.e., non-erased) frame or(//) needed for the formation of signals used by other signal processing 
45 blocks of the decoder during erased frames. No additional signal processing operations are necessary. Blocks 
330 and 300 perform a reduced number of signal processing operations responsive to the receipt of the frame 
erasure signal, as shown in Figure 1, 6, and 7. The frame erasure signal either prompts modified processing 
or causes the module not to operate. 

Note that a reduction in the number of signal processing operations in response to a frame erasure is nor 
50 required for proper operation; blocks 330 and 300 could operate normally, as though no frame erasure has 
occurred, with their output signals being ignored, as discussed above. Under normal conditions, operations (i) 
and (w) are performed. Reduced signal processing operations, however, allow the overall complexity of the de- 
coder to remain within the level of complexity established for a G.728 decoder under normal operation. Without 
reducing operations, the additional operations required to synthesize an excitation signal and bandwidth-ex- 
55 pand LPC coefficients would raise the overall complexity of the decoder. 

In the case of the synthesis filter adapter 330 presented in Figure 6, and with reference to the pseudo- 
code presented in the discussion of the "HYBRID WINDOWING MODULE" at pages 28-29 of the G.728 stan- 
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dard draft, an illustrative reduced set of operations comprises (/) updating buffer memory SB using the syn- 
thesized speech (which is obtained by passing extrapolated ET vectors through a bandwidth expanded version 
of the last good LPC filter) and (//) computing REXP in the specified manner using the updated SB buffer. 
In addition, because the G.728 embodiment use a postfilter which employs 10th-order LPC coefficients 

5 and the first reflection coefficient during erased frames, the illustrative set of reduced operations further com- 
prises (///) the generation of signal values RTMP(1) through RTMP(11) (RTMP(12) through RTMP(51) not need- 
ed) and, (/V) with reference to the pseudo-code presented in the discussion of the "LEVINSON-DURBIN RE- 
CURSION MODULE" at pagds 29-30 of the G.728 standard draft, Levinson-Durbin recursion is performed from 
order 1 to order 10 (with the recursion from order 11 through order 50 not needed). Note that bandwidth ex- 

10 pansion is not performed. 

In the case of vector gain adapter 300 presented in Figure 7, an illustrative reduced set of operations com- 
prises (i) the operations of blocks 67, 39, 40, 41, and 42, which together compute the offset-removed logarith- 
mic gain (based on synthesized ET vectors) and GTMP, the input to block 43; (it) with reference to the pseudo- 
code presented in the discussion of the "HYBRID WINDOWING MODULE" at pages 32-33, the operations of 

15 updating buffer memory SBLG with GTMP and updating REXPLG, the recursive component of the autocor- 
relation function; and (Hi) with reference to the pseudo-code presented in the discussion of the "LOG-GAIN 
LINEAR PREDICTOR" at page 34, the operation of updating filter memory G STATE with GTMP. Note that the 
functions of modules 44, 45, 47 and 48 are not performed. 

As a result of performing the reduced set of operations during erased frames (rather than all operations), 

20 the decoder can properly prepare for the next good frame and provide any needed signals during erased frames 
while reducing the computational complexity of the decoder. 



D. Encoder Modification 



40 



25 As stated above, the present invention does not require any modification to the encoder of the G.728 stan- 

dard. However, such modifications may be advantageous under certain circumstances. For example, if a frame 
erasure occurs at the beginning of a talk spurt (e.g., at the onset of voiced speech from silence), then a syn- 
thesized speech signal obtained from an extrapolated excitation signal is generally not a good approximation 
of the original speech. Moreover, upon the occurrence of the next good frame there is likely to be a significant 
30 mismatch between the internal states of the decoder and those of the encoder. This mismatch of encoder and 
decoder states may take some time to converge. 

One way to address this circumstance is to modify the adapters of the encoder (in addition to the above- 
described modifications to those of the G.728 decoder) so as to improve convergence speed. Both the LPC 
filter coefficient adapter and the gain adapter (predictor) of the encoder may be modified by introducing a spec- 
35 tral smoothing technique (SST) and increasing the amount of bandwidth expansion. 

Figure 8 presents a modified version of the LPC synthesis filter adapter of figure 5 of the G.728 standard 
draft for use in the encoder. The modified synthesis filter adapter 230 includes hybrid windowing module 49, 
which generates autocorrelation coefficients; SST module 495, which performs a spectral smoothing of auto- 
correlation coefficients from windowing module 49; Levinson-Durbin recursion module 50, for generating syn- 
thesis filter coefficients; and bandwidth expansion module 510, for expanding the bandwidth of the spectral 
peaks of the LPC spectrum. The SST module 495 performs spectral smoothing of autocorrelation coefficients 
by multiplying the buffer of autocorrelation coefficients, RTMP(1) - RTMP (51), with the right half of a Gaussian 
window having a standard deviation of 60Hz. This windowed set of autocorrelation coefficients is then applied 
to the Levinson-Durbin recursion module 50 in the normal fashion. Bandwidth expansion module 51 0 operates 
45 on the synthesis filter coefficients like module 51 of the G.728 of the standard draft but uses a bandwidth 
expansion factor of 0.96, rather than 0.988. 

Figure 9 presents a modified version of the vector gain adapter of figure 6 of the G.728 standard draft for 
use in the encoder. The adapter 200 includes a hybrid windowing module 43, an SST module 435, a Levinson- 
Durbin recursion module 44, and a bandwidth expansion module 450. All blocks in Figure 9 are identical to 
50 those of figure 6 of the G.728 standard except for new blocks 435 and 450. Overall, modules 43, 435, 44, and 
450 are arranged like the modules of Figure 8 referenced above. Like SST module 495 of Figure 8, SST module 
435 of Figure 9 performs a spectral smoothing of autocorrelation coefficients by multiplying the buffer of au- 
tocorrelation coefficients, R(1) - R(11). with the right half of a Gaussian window. This time, however, the Gaus- 
sian window has a standard deviation of 45Hz. Bandwidth expansion module 450 of Figure 9 operates on the 
55 synthesis filter coefficients like the bandwidth expansion module 51 of figure 6 of the G.728 standard draft, 
but uses a bandwidth expansion factor of 0.87, rather than 0.906. 
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E. An Illustrative Wireless System 

As stated above, the present invention has application to wireless speech communication systems. Figure 
12 presents an illustrative wireless communication system employing an embodiment of the present invention. 

5 Figure 12 includes a transmitter 600 and a receiver 700. An illustrative embodiment of the transmitter 600 is 
a wireless base station. An illustrative embodiment of the receiver 700 is a mobile user terminal, such as a 
cellular or wireless telephone, or other personal communications system device. (Naturally, a wireless base 
station and user terminal may also include receiver and transmitter circuitry, respectively.) The transmitter 600 
includes a speech coder 610, which may be, for example, a coder according to CCITT standard G.728. The 

10 transmitter further includes a conventional channel coder 620 to provide error detection (or detection and cor- 
rection) capability; a conventional modulator 630; and conventional radio transmission circuitry; all well known 
in the art. Radio signals transmitted by transmitter 600 are received by receiver 700 through a transmission 
channel. Due to, for example, possible destructive interference of various multipath components of the trans- 
mitted signal, receiver 700 may be in a deep fade preventing the clear reception of transmitted bits. Under 

15 such circumstances, frame erasure may occur. 

Receiver 700 includes conventional radio receiver circuitry 710, conventional demodulator 720, channel 
decoder 730, and a speech decoder 740 in accordance with the present invention. Note that the channel de- 
coder generates a frame erasure signal whenever the channel decoder determines the presence of a substan- 
tial number of bit errors (or unreceived bits). Alternatively (or in addition to a frame erasure signal from the 

20 channel decoder), demodulator 720 may provide a frame erasure signal to the decoder 740. 

F. Discussion 

Although specific embodiments of this invention have been shown and described herein, it is to be under- 
25 stood that these embodiments are merely illustrative of the many possible specific arrangements which can 
be devised in application of the principles of the invention. Numerous and varied other arrangements can be 
devised in accordance with these principles by those of ordinary skill in the art without departing from the spirit 
and scope of the invention. 

For example, while the present invention has been described in the context of the G.728 LD-CELP speech 
30 coding system, features of the invention may be applied to other speech coding systems as well. For example, 
such coding systems may include a long-term predictor ( or long-term synthesis filter) for converting a gain- 
scaled excitation signal to a signal having pitch periodicity. Or, such a coding system may not include a post- 
filter. 

In addition, the illustrative embodiment of the present invention is presented as synthesizing excitation 
35 signal samples based on a previously stored gain-scaled excitation signal samples. However, the present in- 
vention may be implemented to synthesize excitation signal samples prior to gain-scaling (i.e., prior to opera- 
tion of gain amplifier 31). Under such circumstances, gain values must also be synthesized (e.g., extrapolated). 

In the discussion above concerning the synthesis of an excitation signal during erased frames, synthesis 
was accomplished illustratively through an extrapolation procedure. It will be apparent to those of skill in the 
40 art that other synthesis techniques, such as interpolation, could be employed. 

As used herein, the term "filter refers to conventional structures for signal synthesis, as well as other proc- 
esses accomplishing a filter-like synthesis function, such other processes include the manipulation of Fourier 
transform coefficients a filter-like result (with or without the removal of perceptually irrelevant information). 

45 



50 
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APPENDIX 

Draft Recommendation G.728 

. Coding of Speech at 16 kbit/s 
Using 

Low-Delay Code Excited Linear Prediction (LD-CELP) 



1. INTRODUCTION 

This recommendation contains the description of an algorithm for the coding of speech signals 
at 16 kbit/s using Low-Delay Code Excited Linear Prediction (LD-CELP). This recommendation 
is organized as follows. 

In Section 2 a brief outline of the LD-CELP algorithm is given. In Sections 3 and 4, the LD- 
CELP encoder and LD-CELP decoder principles are discussed, respectively. In Section 5, the 
computational details pertaining to each functional algorithmic block are defined. Annexes A, B, 
C and D contain tables of constants used by the LD-CELP algorithm. In Annex E the sequencing 
of variable adaptation and use is given. Finally, in Appendix I information is given on procedures 
applicable to the implementation verification of the algorithm. 

Under further study is the future incorporation of three additional appendices (to be published 
separately) consisting of LD-CELP network aspects, LD-CELP fixed-point implementation 
description, and LD-CELP fixed-point verification procedures. 

2. OUTLINE OF LD-CELP 

The LD-CELP algorithm consists of an encoder and a decoder described in Sections 2.1 and 
2.2 respectively, and illustrated in Figure 1 /G.728. 

The essence of QELP techniques, which is an analysis-by-synthesis approach to codebook 
search, is retained in LD-CELP. The LD-CELP however, uses backward adaptation of predictors 
and gain to achieve an algorithmic delay of 0.62S ms. Only the index to the excitation codebook 
is transmitted. The predictor coefficients are updated through LPC analysis of previously 
quantized speech. The excitation gain is updated by using the gain information embedded in the 
previously quantized excitation. The block size for the excitation vector and gain adaptation is 5 
samples only. A perceptual weighting filter is updated using LPC analysis of the unquantized 
speech. 

2.1 LD-CELP Encoder 

After the conversion from A-law or \i-lzw PCM to uniform PCM, the input signal is 
partitioned into blocks of 5 consecutive input signal samples For each input block, the encoder 
passes each of 1024 candidate codebook vectors (stored in an excitation codebook) through a gain 
scaling unit and a synthesis filter. From the resulting 1024 candidate quantized signal vectors, the 
encoder identifies the one that mmimiras a frequency- weighted mean-squared error measure with 
respect to the input signal vector. The 10-bit codebook index of the corresponding best codebook 
vector (or "codevector") which gives rise to that best candidate quantized signal vector is 
transmitted to the decoder. The best codevector is then passed through the gain scaling unit and 
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the synthesis filter to establish the correct filter memory in preparation for the encoding of the next 
signal vector. The synthesis filter coefficients and the gain are updated periodically in a backward 
adaptive manner based on the previously quantized signal and gain-scaled excitation. 

2.2 LD-CELP Decoder 

The decoding operation is also performed on a block-by-block basis. Upon receiving each 
i 0-bit index, the decoder performs a table look-up to extract the corresponding codevector from 
the excitation code book. The extracted codevector is then passed through a gain scaling unit and 
a synthesis filter to produce the current decoded signal vector. The synthesis filter coefficients and 
the gain are then updated in the same way as in the encoder. The decoded signal vector is then 
passed through an adaptive postfilter to enhance the perceptual quality. The postfilter coefficients 
are updated periodically using the information available at the decoder. The 5 samples of the 
postfilter signal vector are next converted to 5 A-law or Maw PCM output samples. 

3. LD-CELP ENCODER PRINCIPLES 

Figure 2/G.728 is a detailed block schematic of the LD-CELP encoder. The encoder in Figure 
2/G.728 is mathematically equivalent to the encoder previously shown in Figure 1/G.728 but is 
computationally more efficient to implement. 

In the following description, 

a. For each variable to be described, k is the sampling index and samples are taken at 125 ys 
intervals. 

b. A group of 5 consecutive samples in a given signal is called a vector of that signaL For 
example. 5 consecutive speech samples form a speech vector, 5 excitation samples form an 
excitation vector, and so on. 

c. We use n to denote the vector index, which is different from the sample index *. 

d. Four consecutive vectors build one adaptation cycle. In a later section, we also refer to 
adaptation cycles as frames. The two terms are used interchangably. 

The excitation Vector Quantization (VQ) codebook index is the only information explicitly 
transmitted from the encoder to the decoder. Three other types of parameters will be periodically 
updated: the excitation gain, the synthesis filar coefficients, and the perceptual weighting filter 
coefficients. These parameters are derived in a backward adaptive manner from signals that occur 
prior to the current signal vector. The excitation gain is updated once per vector, while the 
synthesis filter coefficients and the perceptual weighting filter coefficients are updated once every 
4 vectors (i.e., a 20-sampie, or 2J ma update period). Note that, alttougb the processing sequen ce 
in the algorithm has an adaptation cycle of 4 vectors (20 samples), the basic buffer size is still 
only I vector (5 samples). This small buffer size makes it possible to achieve a one-way delay 
less than 2 ms. 

A description of each block of the encoder is given below. Since the LD-CELP coder is 
mainly used for encoding speech, for convenience of description, in the following we will assume 
that the input signal is speech, although in practice it can be other non-speech signals as well. 
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3 J input PCM Format Conversion 

This block converts the input A-law oru-iaw PCM signal s a (k) to a uniform PCM signal s u (k). 

3.1 J Internal Linear PCM Levels 

In convening ftora *-law or [lAzw to linear PCM, different internal rcptesentations are 
possible, depending on the device. For example, standard tables for u.-law PCM define a linear 
range of -4015.5 to +4015 J. The corresponding range for A-law PCM is -2016 to +2016. Both 
tables list some output values having a fractional pan of 0.5. These fractional parts cannot be 
represented in an integer device unless the entire table is multiplied by 2 to make all of the values 
integers. In fact, this is what is most commonly done in fixed point Digital Signal Processing 
(DSP) chips. On the other hand, floating point DSP chips can represent the same values listed in 
the tables. Throughout this document it is assumed that the input signal has a maximum range of 
-4095 to +4095. This encompasses both the u-law and A-law cases, in the case of A-law it implies 
that when the linear conversion results in a range of -2016 to +2016* those values should be scaled 
up by a factor of 2 before continuing to encode the signal. In the case of u.-law input to a fixed 
point processor where the input range is convened to -8031 to +8031. it implies that values should 
be scaled down by a factor of 2 before beginning the encoding process. Alternatively, these 
values can be treated as being in Ql format meaning there is i bit to the right of the decimal 
point. All computation involving the data would then need to take this bit into account. 

For the case of 16-bit linear PCM input signals having the full dynamic range of -32768 to 
+32767, the input values should be considered to be in Q3 format- This means that the input 
values should be scaled down (divided) by a factor of 8. On output at the decoder the factor of 8 
would be restored for these signals. 

32 Vector Buffer 

This block buffers 5 consecutive s p eec h samples sj&n \ zj$n j if (5/i+4) to form a 5- 

dimensional speech vector s(n) » [s m {Sn) % * M (5/i+»l), • • • . J„(5n+4)]. 

3 J Adapter for Perceptual Weighting Filler 

Figure 4/G.728 shows the detailed operation of the perceptual weighting filter adapter (block 3 
in Figure 2A3.728). This adapter calculates the coefficients of the perceptual weighting filter once 
every 4 speech vectors based on linear prediction analysis (often referred to as LPC analysis) of 
unquantized speech. The coefficient updates occur at the third speech vector of every 4- vector 
adaptation cycle. The coefficients are held constant in between updates. 

Refer to Figure 4<aVG.728. The calculation is performed as follows. First, the input 
(unquantized) sp ee ch vector is passed through a hybrid windowing module (block 36) which 
places a window on previous speech vectors and calculates the first 1 1 autocorrelation coefficients 
of the windowed speech signal as the output The Levinson-Durtxn recursion module (block 37) 
then converts these autocorrelation coefficients to predictor coefficients. Based on these predictor 
coefficients, the weighting filter coefficient calculator (Mock 38) derives the desired coefficients of 
the weighting filter. These three Modes are discussed in more detail below. 
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First, let us describe the principles of hybrid windowing. Since this hybrid windowing 
technique will be used in three different kinds of LPC analyses, we first give a more general 
description of the technique and then specialize it to different cases. Suppose the LPC analysis is 
to be performed once every L signal samples. To be general, assume that the signal samples 

corresponding to the current LD-CELP adaptation cycle are s m (m). s m (m+2) 

^(/n+L-1), Then, for backward-adaptive LPC analysis, the hybrid window is applied to all 
previous signal samples with a sample index less than m (as shown in Figure 4(b)/G.728). Let 
there be N non-recursive samples in the hybrid window function. Then, the signal samples 
s u (m-\)* s u (m -2), J.(m-/0 are all weighted by the non-recursive portion of the window. 
Starting with j,(m-tf-l). all signal samples to the left of (and including) this sample are weighted 

by the recursive portion of the window, which has values b. bcu ba 1 where 0 <b < 1 and 

0<a<l. 

At time nu the hybrid window function w m {k) is defined as 

= ' *-(*) = -sin(c(*wn)] , ifm-hrzk&n-l , (la) 
0 . ifkZm 

and the window-weighted signal is 



'.(*)*«(*) = -s.,tt)sin[<r(Jt-m)] . if m~N<k&n-l . (lb) 
0 , if k*m 



The samples of non-recursive portion g m (k) and the initial section of the recursive portion f m {k) for 
different hybrid windows are specified in Annex A. For an M-th order LPC analysis, we need to 

calculate Af+1 autocorrelation coefficients R m (i) for i - 0, I, 2 M. The Mh autocorrelation 

coefficient for the current adaptation cycle can be expressed as 



(lc) 



X* (0 = * m W s m (* - /) = r m (i) + j.Wj.d-i), 
where 

r m V)= Z s m {k)s m {k-i)= X i.(«^.a-0/.ft)/.(*./). (Id) 

On the right-hand side of equation (lc), the first term r m {i) is the "recursive component" of 
R m (i\ while the second term is the "noo- recursive component". The finite summation of the non- 
recursive component is calculated for each adaptation cycle. On the other hand, the recursive 
component is calculated recursively. The following paragraphs explain how. 

Suppose we have calailatrxl and stored all r m {t )*s for the current adaptation cycle and want to 
go on to the next adaptation cycle, which starts at sample s m (m+L). After the hybrid window is 
shifted to the right by L samples, the new window-weighted signal for the next adaptation cycle 
becomes 
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s*(*)gm~i(k) = -sA±)*nlc(k-m-L)) . if m+L-N Zk&n+L-\ . (lc) 
0 . i(kZm+L 



The recursive component of R m ^(/) can be written as 

m+L N 

or 

^o^r.ff)-* 2 W(«*.«.<*-*> . — - (Iff) 

Therefore, r m ^{i) can be calculated recursively from ^(i) using equation (lg). This newly 
calculated is stored back to memory for use in the following adaptation cycle. The 

autocorrelation coefficient R m ^(i) is then calculated as 

So far we have described in a general manner the principles of a hybrid window calculation 
procedure. The parameter values for the hybrid windowing module 36 in Figure 4(a)/G.728 are M 

= 1 0.L = 20. N = 30, and a = = 0^2820598 (so that a 21 = y X 

Once the 1 1 autocorrelation coefficients *(/), i = 0, 1 10 are calculated by the hybrid 

windowing procedure described above, a "white noise correction" procedure is applied. This is 
done by increasing the energy R (0) by a small amount: 



*(°>^(f§J*<°> (10 

This has the effect of filling the spectral valleys with white noise so as to reduce the spectral 
dynamic range and alleviate ill-conditioning of the subsequent Levinsoo-Durbin recursion. The 
white noise correction factor (WNCF) of 257/256 corresponds to a white noise level about 24 dB 
below the average speech power. 

Next, using the white noise corrected autocorrelation coefficients, the Levinsoo-Durbin 
recursion module 37 recursively computes the predictor coefficients from order 1 to order 10. Let 
the ;-th coefficients of the Mh order predictor be a { p. Then, the recursive procedure can be 

specified as follows: 

£(0)=*(0) (2a) 
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10 



15 



*- TiTT) — (2b) 

= (2c) 

*fmd$-*+kdtf>. is/si-i (2d) 

E(/) = <1 -*?)£</-!). (2c) 

. Equations (2b) through (2e) are evaluated recursively for i = 1, 2, 10. and the final solution is 
given by 

qi = aW. IS/ £10. (20 

If we define <j 0 ~ I. then the 1 0-th order "prediction-error filter" (sometimes called "analysis 
filter") has the transfer function 

CM-X^. (3a) 
and the corresponding 10-th order linear predictor is defined by the following transfer function 

G(*) = -E<7i* w • (3b) 

iml 

The weighting filter coefficient calculator (Mock 38) calculates the perceptual weighting filter 
coefficients according to the following equations: 

1-GUTi) 

"^T^qu^- 0 ***** 1 " (4a) 

C(^i)=-2toV)^. (4b) 
1-1 



20 



25 



35 



and 



C(i/%)=-Xto72> w . (4c) 
t-i 

The perceptual weighting filter is a 10-th order pole-zero filter defined by the transfer function 
40 IV (x) in equation (4a). The values of Yi and <fc arc 0.9 and 0.6, respectively. 

Now refer to Figure 2/G.728. The perceptual weighting filter adapter (block 3) periodically 
updates the coefficients of W{i) according to equations. (2) through (4), and feeds the coefficients 
to the impulse response vector calculator (block 12) and the perceptual weighting filters (blocks 4 
45 and 10). 

3.4 Perceptual Weighting Filter 

In Figure 2/G.728, the current input speech vector s(n) is passed through the perceptual 
weighting filter (block 4), resulting in the weighted speech vector v(*i). Note that except during 
x initialization, the filter memory (i.e., internal state variables, or the values held in the delay units 
of the filter) should not be reset to zero at any time. On the other hand, the memory of the 
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perceptual weighting filter (block 10) will need special handling as described later. 

3.4.1 Non-speech Operation 

For modem signals or other non-speech signals, CCITT test results indicate that it is desirable 
10 disable the perceptual weighting filter. This is equivalent to setting W{ 2 y=\. This can most 
easily be accomplished if 7, and % in equation (4a) are sex equal to zero. The nominal values for 
these variables in the speech mode are 0.9 and 0.6. respectively. 

35 Synthesis Filter 

In Figure 2/G.728. there arc two synthesis filters (blocks 9 and 22) with identical coefficients. 
Both filters are updated by the backward synthesis filter adapter (block 23). Each synthesis filter 
is a 50-th order all -pole filter that consists of a feedback loop with a 50-th order LPC predictor in 
the feedback branch. The transfer function of the synthesis filter is F(z) = -P(z)] % where P(z) 
is the transfer function of the 5f>th order LPC predictor. 

After the weighted speech vector v(*) has been obtained, a zero-input response vector r(n) 
will be gene rated using the synthesis filter (block 9) and the perceptual weighting filter (block 10). 
To accomplish this, we first open the switch 5, i.e.. point it to node 6. This implies that the signal 
going from node 7 to the synthesis filler 9 will be zero. We then let the synthesis filter 9 and the 
perceptual weighting filter 10 "ring" for 5 samples (I vector). This means that we continue the 
filtering operation for 5 samples with a zero signal applied at node 7. The resulting output of the 
perceptual weighting filter 10 is the desired zero-input response vector r (a). 

Note that except for the vector right after initialization, the memory of the filters 9 and 10 is in 
general non-zero; therefore, the output vector r(n) is also non-zero in general, even though the 
filter input from node 7 is zero. In effect, this vector r(/i) is the response of the two filters to 
previous gain-scaled excitation vectors e(n-\), *(/i-2). ... This vector actually represents the 
effect due to filter memory up to time (/t-l). 

3.6 VQ Target Vector Computation 

This block subtracts the zero-input response vector r(n) from the weighted speech vector v (n ) 
to obtain the VQ codebook search target vector x(«). 

5.7 Backward Synthesis Fiber Adapter 

This adapter 23 updates the coefficients of the synthesis filters 9 and 22. It takes the quantized 
(synthesized) s pee ch as input and produces a set of synthesis filter coefficients as output. Its 
operation is quite similar to the perce p tu a l weighting filter adapter 3. 

A blown-up version of this adapter is shown in Rgure 5/G.72S. The operation of the hybrid 
windowing module 49 and the Levinsoo-r>irbin recursion module 50 is exactly the same as their 
counter parts (36 and 37) in Figure 4(aVG.728. except for the following three differences: 

a. The input signal is now the qnami7c*1 speech rather than die unquanmrd input speech. 

b. The predictor order is 50 rather than 10. 
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c. The hybrid window parameters are different: S = 35, a = ~ = 0.992833749. 

Note that the update period is still L = 20, and the white noise correction factor is still 257/256 = 
1.00390625. 

Let P(z) be the transfer function of the 50-th order UPC predictor, then it has the form 

= - 225.Z-V (5) 

where 2,'s are the predictor coefficients. To improve robustness to channel errors, these 
coefficients are modified so that the peaks in the resulting LPC spectrum have slightly larger 
band widths. The bandwidth expansion module 51 performs this bandwidth expansion procedure 
in the following way. Given the LPC predictor coefficients S,*s, a new set of coefficients <i/s is 
computed according to 

a 4 = X'i . rf= 1.2 50. (6) 

where X is given by 

X= H|-=: 0.98828125 . (7) 

This has the effects of moving all the poles of the synthesis filter radially toward the origin by a 
factor of X. Since the poles are moved away from the unit circle, the peaks in the frequency 
response are widened. 

After such bandwidth expansion, the modified LPC predictor has a transfer function of 

30 

The modified coefficients are then fed to the synthesis filters 9 and 22. These coefficients are also 
fed to the impulse response vector calculator 12. 

The synthesis filters 9 and 22 both have a transfer function of 

Similar to the perceptual weighting filter, the synthesis filters 9 and 22 are also updated once 
every 4 vectors, and the updates also occur at the third speech vector of every 4-vcctor adaptation 
cycle. However, the updates are based on the quantized speech up to the last vector of the 
previous adaptation cycle. In other words, a delay of 2 vectors is introduced before the updates 
take place. This is because the Levinson-Durbin recursion module 50 and the energy table 
calculator 15 (described later) are computationally intensive. As a result, even though the 
autocorrelation of previously quantized speech is available at the first vector of each 4-vector 
cycle, computations may require more than one vector worth of time. Therefore, to maintain a 
basic buffer size of 1 vector (so as to keep the coding delay low), and to maintain real-time 
operation, a 2-vector delay in filter updates is introduced in order to facilitate real-time 
implementation. 
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3.8 Backward Vector Gain Adapter 

This adapter updates the excitation gain a(a) for every vector time index n. The excitation 
gain o(n) is a scaling factor used to scale, the selected excitation vector v(n). The adapter 20 takes 
the gain-scaled excitation vector c(n) as its input, and produces an excitation gain o(/i) as its 
output Basically, it attempts to "predict" the gain of t (n) based on the gains of *(n-l). t (n-2), ... 
by using adaptive linear prediction in the logarithmic gain domain. This backward vector gain 
adapter 20 is shown in more detail in Figure 67G.728. 

Refer to Fig 6/G.728. This gain adapter operates as follows. The I -vector delay unit 67 
makes the previous gain-scaled excitation vector *(n-l) available. The Root-Mean-Square 
(RMS) calculator 39 then calculates the RMS value of the vector *(/i-l). Next, the logarithm 
calculator 40 calculates the dB value of the RMS of t(n-\\ by first computing the base 10 
logarithm and then multiplying the result by 20. 

In Figure 6/G.728, a log-gain offset value of 32 dB is stored in the log-gain offset value holder 
41. This values is meant to be roughly equal to. the average excitation gain level (in dB) during 
voiced speech. The adder 42 subtracts this log-gain offset value from the logarithmic gain 
produced by the logarithm calculator 40. The resulting offset-removed logarithmic gain S(n -1) is 
then used by the hybrid windowing module 43 and the Levinson-Durbin recursion module 44. 
Again, blocks 43 and 44 operate in exactly the same way as blocks 36 and 37 in die perceptual 
weighting filter adapter module (Figure 4(a)/G.728), except that the hybrid window parameters are 
different and that the signal under analysis is now the offset-removed logarithmic gain rather than 
the input speech. (Note that only one gain value is produced for every 5 speech samples.) The 

hybrid window parameters of block 43 are Af = 10, N = 20, L = 4, a = ^ j"*" = 0.96467863. 

The output of the Levinson-Durbin recursion module 44 is the coefficients of a 10-th order 
linear predictor with a transfer function of 

— Z&r 1 . (10) 
f-i 

The bandwidth expansion module 45 then moves the roots of this polynomial radially toward the 
z-plane original in a way similar to the module 51 in Figure 5/G.728. The resulting bandwidth- 
expanded gain predictor has a transfer function of 

(11) 

where the coefficients a,*s are computed as 

S (0.90625)'a l • (12) 



Such bandwidth expansion makes the gain adapter (block 20 in Figure 2/G.728) more robust to 
channel errors. These ov's are then used as the coefficients of the log-gain linear predictor (block 
46 of Figure 6/G.728). 
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This predictor 46 is updated once every 4 speech vectors, and the updates take place at the 
second speech vector of every 4-vector adaptation cycle. The predictor attempts to predict 

based on a linear combination of S(n-2) 5{/i-10). The predicted version of 6(/i) is 

denoted as 5(a) and is given by 



After S(/i) has been produced by the log-gain linear predictor 46, we add back the log-gain 
offset value of 32 dB stored in 41. The log-gain limiter 47 then checks the resulting log-gain value 
and clips it if the value is unreasonably large or unreasonably smalL The lower and upper limits 
are set to 0 dB and 60 dB. respectively. The gain limiter output is then fed to the inverse 
logarithm calculator 4$, which reverses the operation of the logarithm calculator 40 and converts 
the gain from the dB value to the linear domain. The gain limiter ensures that the gain in the 
linear domain is in between 1 and 1000. 

3J9 Codebook Search Module 

In Figure 2/G.728, blocks 12 through 18 constitute a codebook search module 24. This 
module searches through the 1024 candidate code vectors in the excitation VQ codebook 19 and 
identifies the index of the best codevector which gives a corresponding quantized speech vector 
that is closest to the input speech vector. 

To reduce the codebook search complexity, the 10-bit, 1024-entry codebook is decomposed 
into two smaller codebooks: a 7-bit "shape codebook" containing 128 independent codevectors 
and a 3 -bit "gain codebook" containing 8 scalar values that are symmetric with respect to zero 
(i.e., one bit for sign, two bits for magnitude). The final output codevector is the product of the 
best shape codevector (from the 7-bit shape codebook) and the best gain level (from the 3-bit gain 
codebook). The 7-bit shape codebook table and the 3-bit gain codebook table are given in Annex 
B. 

39 J Principle of Codebook Search 

In principle, the codebook search module 24 scales' each of the 1024 candidate codevectors by 
the current excitation gain o(n) and then passes the resulting 1024 vectors one at a time through a 
cascaded filter consisting of the synthesis filter F(x) and the perceptual weighting filter W(*). The 
filter memory is initialized to zero each time the module feeds a new codevector to the cascaded 
filter with transfer function U\i) = F(i)W(i). 

The filtering of VQ codevectors can be expressed in terms of matrix-vector multiplication. 
Let yj be the j-ih codevector in the 7-bit shape codebook, and let g t be the i-ih level in the 3-bit 
gain codebook. Let {h(n)} denote the impulse response sequence of the cascaded filter. Then, 
when the codevector specified by the codebook indices / and j is fed to the cascaded filter H{z\ the 
filter output can be expressed as 



10 



(13) 



10 



Xij = Ho(n)giyj , 



(14) 



50 



where 
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10 



15 



20 



25 



30 



35 



40 



45 



50 



MO) 0 0 0 0 
MD MO) 0 0 0 
A (2) A (I) MO) 0 0 
/i(3) A(2) MD A(0) 0 
M4) AO) A(2) A(l) A(0) J 



(15) 



The codebook search module 24 searches for the best combination of indices / and j which 
minimizes the following Mean-Squared Error (MSE) distortion. 

£>= \\xW^j\\ 2 =6 2 (n)\\iM-g i Hy J \\ t . (16) 

where £(«) = x(n)/a(/i) is the gain-normalized VQ target vector. Expanding the terms gives us 

* D « o 2 (/i)[ II II * - 2*i r </i)Hy, + g} II Hy, II *] . (17) 

Since the term l\x(n) II 2 and the value of c^/j) arc fixed during the codetook search, 
minimizing D is equivalent to minimizing 



where 



and 



i> = -2w r (*)jV + j?£r * 



p(«)=H r Jt(n) 



(18) 



(19) 



£,= llHy y ll 2 . (20) 

Note that Ej is actually the energy of the y-th filtered shape codevectors and docs not depend 
on the VQ target vector Jf(u). Also note that the shape codevector y, is fixed, and the matrix H 
only depends on the synthesis filter and the weighting filter, which are fixed over a period of 4 
speech vectors. Consequently. £, is also fixed over a period of 4 speech vectors. Based on this 
observation, when the two filters are updated, we can compute and store the 128 possible energy 
terms E i% j = 0. 1, 2. 127 (corresponding to the 128 shape codevectors) and then use these 
energy terms repeatedly for the codebook search during the next 4 speech vectors. This 
arrangement reduces the codebook search complexity. 

For further reduction in computation, we can precompute and store the two arrays 



(21) 



and 



Ci = sl (22) 
for/ = 0. 7, These two arrays are fixed since gfs are fixed. We can now express D as 

D^-bfj + dEj . (23) 

where Pj=p T (n)yj. 

Note that once the E j% b i% and c, tables are precomputed and stored, the inner product term 
Pj =p T {n)y h which solely depends on j % takes most of the computation in determining £>. Thus, 
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the codebook search procedure steps through the shape codebook and identifies the best gain 
index i for each shape code vector y 7 . 

There axe several ways to find the best gain index / for a given shape codevectory,. 

a. The first and the most obvious way is to evaluate the 8 possible D values corresponding to 
the 8 possible values of /. and then pick the index / which corresponds to the smallest 5. 
However, this requires 2 multiplications for each /. 

b. A second way is to compute the optimal gain g = /»,/£, first, and then quantize this gain g to 
one of the 8 gain levels {g Qm ....g 7 } in the 3-bit gain codebook. The best index i is the index 
of the gain level gi which is closest to g. However, this approach requires a division 
operation for each of the 128 shape codevectors, and division is typically very inefficient to 
implement using DSP processors. 

c. A third approach, which is a slightly modified version of the second approach, is 
particularly efficient for DSP implementations. The quantization of g can be thought of as a 
series of comparisons between g and the "quantizer ceil boundaries", which are the mid- 
points between adjacent gain levels. Let 4 be the mid-point between gain level gi and 
that have the same sign. Then, testing m g < d*T is equivalent to testing m P j < a\E-l m . 
Therefore, by using the latter test, we can avoid the division operation and still require only 
one multiplication for each index /. This is the approach used in the codebook search. The 
gain quantizer cell boundaries d/s are fixed and can be precomputed and stored in a table. 
For the 8 gain levels, actually only 6 boundary values d*.d x , d J% d A% d^ and d 6 are used. 

Once the best indices / and > are identified, they are concatenated to form the output of the 
codebook search module — a single 10-bit best codebook index. 

5S2 Operation of Codebook Search Module 

With the codebook search principle introduced, the operation of the codebook search module 
24 is now described below. Refer to Figure 2/G.728. Every time when the synthesis filter 9 and 
the perceptual weighting filter 10 ire updated, the impulse response vector calculator 12 computes 
the first 5 samples of the impulse response of the cascaded filter F{z)W(z). To compute the 
impulse response vector, we first set the memory of the cascaded filter to zero, then excite the filter 
with an input sequence {I. 0. 0. 0, 0}. The corresponding 5 output samples of the filter are A(0). 

h ( O h{A\ which constitute the desired impulse response vector. After this impulse response 

vector is computed, it will be held constant and used in the codebook search for the following 4 
speech vectors, until the filters 9 and 10 are updated again. 

Next, the shape codevector convolution module 14 computes the 128 vectors Hy h j - 0. I. 2. 

.... 127. In other words, it convolves each shape codevector y y . > = 0, 1.2. 127 with the impulse 

response sequence A(0). A(D M4), where the convolution is only performed for the first 5 

samples. The energies of the resulting 128 vectors are then computed and stored by the energy 
table calculator 15 according to equation (20). The energy of a vector is defined as the sum of the 
squared value of each vector component. 

Note that the computations in blocks 12. 14. and 15 are performed only once every 4 speech 
vectors, while the other blocks in the codebook search module perform computations for each 
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speech vector. Also note that the updates of the E, table is synchronized with the updates of the 
syntheses filter coefficients. That is. the new Ej table will be used starting from the third speech 
vector of every adaptanon cycle. (Referto the discussion in Section 3.7.) 

The VQ target vector normalization module 16 calculates the gain-normalized VQ target 
vector x(„> = x(*ya(«). In DSP implementations, ii is more efficient to first compute Vain), and 
then multiply each component of x(n ) by l/ot* ). 

Next, the time-reversed convolution module 13 computes the vector p<*> = H T i(n) This 
operation .s equivalent to first reversing the order of the components of i(„). then convolving the 
resulting vector with the impulse response vector, and then reverse the component order of the 
output again (and hence the name "time-reversed convolution"). 

Once b,. and c, tables are prccomputed and stored, and the vector p(n) is also calculated 
then the error calculator 17 and the best code book index selector 18 work together to perform the 
foUowmg efficient codebook search algorithm. 

a Initialize D mm to a number larger than the largest possible value of D (or use the largest 
possible number of the DSP's number representation system). 

b. Set the shape code book index j = 0 

c. Compute the inner product P,=p *(« )y y . 

d- If P, <0. go to step h to search through negative gains: otherwise, proceed to step e to 
search through positive gains. 

e. If P t < d 0 E,. set i - 0 and go to step t otherwise proceed to step f. 

f. If P, <<*,£ y .seti = 1 and go to step It; otherwise proceed to step g. 

g. UP, < d 2 E,. set / = 2 and go to step t otherwise set / = 3 and go to step k. 
UPj > d A E,. set j fa 4 and go to step t otherwise proceed to step L 
If Pj> dsEj. set / = 5 and go to step fc otherwise proceed to step j. 

j. If f, ><*«£). set/ = 6; otherwise set i = 7. 

k. Compute D»- hf,* c,£y 

Ifo <3 a- . then set =D./ M =/. andy^, =/. 

If j < 127. set jm j + i and go to step 3: otherwise proceed to stepn. 

When the algorithm proceeds to here, all 1024 possible combinations of gains and shapes 
have been searched through. The resulting i^. and /„. are the desired channel indices for 
the gain and the shape, respectively. The output best code book index (10-bit) is the 
concatenation of these two indices, and the corresponding best excitation codevector is 
y<") = *.-.y/_. T 1 * selected 10-bit codebooic index is transmitted through the 
communication channel to the decoder. 



h. 

i. 



L 
m. 
n. 



22 



.067301 8A2_I_> 



EP 0 673 018 A2 



3.10 Simulated Decoder 

Although the encoder has identified and transmitted the best codebook index so far, some 
additional tasks have to be performed in preparation for the encoding of the following speech 
vectors. FirsL the best codebook index is fed to the excitation VQ codebook to extract the 
corresponding best codevector y(n) = g iw _y j9m . This best codevector is then scaled by the current 
excitation gain c(n) in the gain stage 21. The resulting gain-scaled excitation vector is 
*(/i) = a(/i)y(/i). 

This vector e(n) is then passed through the synthesis filter 22 to obtain the current quantized 
speech vector s^n). Note that blocks 19 through 23 form a simulated decoder 8. Hence, the 
quantized speech vector s,(n) is actually the simulated decoded speech vector when there arc no 
channel errors. In Figure 2/G.728, the backward synthesis filter adapter 23 needs this quantized 
speech vector j f (n) to update the synthesis filter coefficients. Similarly, the backward vector gain 
adapter 20 needs the gain-scaled excitation vector e (n) to update the coefficients of the log.gain 
linear predictor. 

One last task before proceeding to encode the next speech vector is to update the memory of 
the synthesis filter 9 and the perceptual weighting filter 10. To accomplish this, we first save the 
memory of filters 9 and 10 which was left over after performing the zero-input response 
computation described in Section 3.5. We then set the memory of filters 9 and 10 to zero and 
close the switch 5. i.e., connect it to node 7. Then, the gain-scaled excitation vector t (n) is passed 
through the two zero-memory filters 9 and 10. Note that since ^(n) is only 5 samples long and the 
filters have zero memory, the number of multiply-adds only goes up from 0 to 4 for the 5 -sample 
period This is a significant saving in computation since there would be 70 multiply-adds per 
sample if the filter memory were not zero. Next, we add the saved original filter memory back to 
the newly established filter memory after filtering <r(/i). This in effect adds the zero-input 
responses to the zero-state responses of the filters 9 and 10. This results in the desired set of filter 
memory which will be used to compute the zero-input response during the encoding of the next 
speech vector. 

Note that after the filter memory update, the top 5 elements of the memory of the synthesis 
filter 9 arc exactly the same as the components of the desired quantized speech vector * f (/i). 
Therefore, we can actually omit the synthesis filter 22 and obtain r f (/i) from the updated memory 
of the synthesis filter 9. This means an additional saving of 30 multiply-adds per sample. 

The encoder operation described so far specifies the way to encode a single input speech 
vector. The encoding of the entire s p eec h waveform is achieved by repeating the above operation 
for every speech vector. 

3JJ Synchronization <& ln-band Signalling 

In the above description of the encoder; it is assumed that the decoder knows the boundaries of 
the received 10-bit codebook indices and also knows when the synthesis filter and the log-gain 
predictor need to be updated (recall that they are updated once every 4 vectors). In practice, such 
synchronization information can be made available to the decoder by adding extra 
synchronization bits on top of the transmitted 16 kbit/s bit stream. However, in many applications 
there is a need to insert synchronization or in-band signalling bits as part of the 16 kbit/s bit 
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sLream. This can be done in the following way. Suppose a synchronization bii is to be inserted 
once every ,v speech vectors: then, for every /v-th input speech vector, we can search through only 
half of the shape codebook and produce a 6-bit shape code book index. In this way. we rob one bit 
out of every /V-th transmitted codebook index and insert a synchronization or signalling bit 
instead. 

It is important to note that we cannot arbitrarily rob one bit out of an already selected 7-bit 
shape codebook index, instead, the encoder has to know which speech vectors will be robbed one 
bit and then search through only half of the codebook for those speech vectors. Otherwise, the 
decoder will not have the same decoded excitation codevectors for those speech vectors. 

Since the coding algorithm has a basic adaptation cycle of 4 vectors, it is reasonable to let A/ be 
a multiple of 4 so that the decoder can easily determine the boundaries of the encoder adaptation 
cycles. For a reasonable value of iV (such as 16, which corresponds to a 10 milliseconds bit 
robbing period), the resulting degradation in speech quality is essentially negligible. In particular, 
we have found that a value of N=\6 results in little additional distortion. The rate of this bit 
robbing is only 100 bits/s. 

If the above procedure is followed, we recommend dm when the desired bit is to be aO, only 
the first half of the shape codebook be searched, i.e. those vectors with indices 0 to 63. When the 
desired bit is a 1, then the second half of the codebook is searched and the resulting index will be 
between 64 and 127. The significance of this choice is thai the desired bit will be the leftmost bit 
in the codeword, since the 7 bits for the shape codevector precede the 3 bits for the sign and gain 
codebook. We further recommend that the synchronization bit be robbed from the last vector in a 
cycle of 4 vectors. Once it is detected, the next codeword received can begin the new cycle of 
codevectors. 

Although we state that synchronization causes very little distortion, we note that no formal 
testing has been done on hardware which contained this synchronization strategy. Consequently, 
the amount of the degradation has not been measured 

However, we specifically recommend against using the synchronization bit for 
synchronization in systems in which the coder is turned on and off repeatedly. For example, a 
system might use a speech activity detector to turn off the coder when no speech were present 
Each time the encoder was turned on, the decoder would need to locate the synchronization 
sequence. At 100 btts/s, this would probably take several hundred milliseconds In addition, time 
must be allowed for the decoder state to track the encoder state. The combined result would be a 
phenomena known as front-end clipping in which the beginning of the sp e ec h utterance would be 
lost. If the encoder and decoder are both started at the same instant as the onset of speech, then no 
speech will be lost. This is only possible in systems using external signalling for the start-up 
times and external synchronization. 
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4. LD-CELP DECODER PRINCIPLES 

Figure 3/G.728 is a block schematic of the LD-CELP decoder. A functional description of 
each block is given in the following sections. 

4, J Excitation VQ Codebook 

This block contains an excitation VQ codebook (including shape and gain codebooks) 
identical to the codebook 19 in the LD-CELP encoder Ii uses the received best codebook index 
co extract the best codevectory (*) selected in the LD-CELP encoder. 

42 Gain Scaling Unit 

This block computes the scaled excitation vector e(n) by multiplying each component of y(n) 
by the gaino(/i). 

43 Synthesis Filter 

This filter has the same transfer function as the synthesis filter in the LD-CELP encoder 
(assuming error-free transmission). It filters the scaled excitation vector e(n) to produce the 
decoded speech vector s<{n). Note thai in order to avoid any possible accumulation of round-off 
errors during decoding, sometimes it is desirable to exactly duplicate the procedures used in the 
encoder to obtain s,(n). If this is the case, and if the encoder obtains s f (n) from the updated 
memory of the synthesis filter 9. then the decoder should also compute s<(n) as the sum of the 
zero-input response and the zero-state response of the synthesis filter 32. as is done in the encoder. 
4.4 Backward Vector Gain Adapter 

The function of this block is described in Section 3.8. 
4 J Backward Synthesis Filter Adapter 

The function of this block is described in Section 3.7. 
4.6 Postfilter 

This block filters the decoded speech to enhance the perceptual quality. This Mock is further 
expanded in Figure 7/G.728 to show more details. Refer to Figure 7/G.728. The postfilter 
basically consists of three major pans: (1) long-term postfilter 71 . (2) sboit-tenn postfilter 72, and 
(3) ouqxit gain scaling unit 77. The other four Mocks in Figure 7/G.728 are just to calculate the 
appropriate scaling factor for use in the output gain scaling unit 77. 

The long-term postfilter 71. sometimes called the pitch postfilter, is a comb filter with its 
spectral peaks located at multiples of the fundamental frequency {or pitch frequency) of the speech 
to be postfiltered. The reciprocal of the fundamental frequency is called the pitch period. The 
pitch period can be extracted from the decoded speech using a pitch detector (or pitch extractor). 
Let p be the fundamental pitch period (in samples) obtained by a pitch detector, then the transfer 
function of the long-term postfilter can be expressed as 

"/(*) = */(! (24) 

where the coefficients g f , b and the pitch period p are updated once every 4 speech vectors (an 
adaptation cycle) and the actual updates occur at the third speech vector of each adaptation cycle. 
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For convenience, we will from now on call an adaptation cycle a frame. The derivation of*,, b 
and p will be described later in Section 4.7. 

The short-term postfilter 72 consists of a lOth-oider pole-zero filter in cascade with a first- 
order all-zero filter. The lOth-order pole-zero filter attenuates the frequency components between 
form ant peaks, while the first-order all-zero filter attempts to compensate for the spectral tilt in the 
frequency response of the lOth-oider pole-zero filter. 

Let 5 f . i = 2 10 be the coefficients of the lOth-order LPC predictor obtained by backward 

LPC analysis of the decoded speech, and let k , be the first reflection coefficient obtained by the 
same LPC analysis. Then, both 5,*s and k, can be obtained as by-products of the SOth-order 
backward LPC analysis (block 50 in Figure 5/G.728). All we have to do is to stop the 50th-oider 
Levinson-Durbin recursion at order 10. copy *, and 5,. a 2 _.5 ic . and then resume the Levinson- 
Durbin recursion from order 1 1 to order 50.- The transfer function of the short-term postfilter is 

10 

",(') if U+PT-«1 (25) 

i-i 



where 



and 



h = a, (0.65y . r = 1 . 2,_.. 10 , (26) 
Oi = a, (0.75/. i = l, 2,_, 10 . (27) 



u = (0.15)*, (28) 

The coefficients 5,'s. Vs. and u are also updated once a frame, but the updates take place at the 
first vector of each frame fi.e. as soon as a, 's become available). 

In general, after the decoded speech is passed through the long-term postfilter and the short- 
term postfilter, the filtered speech will not have the same power level as the decoded (unfiltered) 
speech. To avoid occasional large gain excursions, it is necessary to use automatic gain control to 
force the postfiltered speech to have roughly the same power as the unfiltered speech. This is 
done by blocks 73 through 77. 

The sum of absolute value calculator 73 operates vector-by-vector. It takes the current 
decoded speech vector sA») and calculates the sum of the absolute values of its 5 vector 
components. Similarly, the sum of absolute value calculator 74 performs the same type of 
calculation, but on the current output vector of the short-term postfilter. The scaling factor 
calculator 75 then divides the output value of block 73 by the output value of block 74 to obtain a 
scaling factor for the current */n) vector. This scaling factor is then filtered by a fiist-order 
lowpass filter 76 to get a separate scaling factor for each of the 5 components of ,/n) The first- 
order lowpass filter 76 has a transfer function of 0.01/(1 -0.99*- 1 ). The lowpass filtered scaling 
factor is used by the output gain scaling unit 77 to perform sample-by-saraple scaling of the 
short-tenn postfilter output Note that since the scaling factor calculator 75 only generates one 
scaling factor per vector, it would have a stair-case effect on the sample-by-sample scaling 
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operation of block 77 if the lowpass filter 76 were not present. The lowpass filter 76 effectively 
smoothes out such a stair-case effect. 

4.6.1 Non-speech Operation CCTTT objective lest results indicate that for some non-speech 
signals, the performance of the coder is improved when the adaptive postfilter is turned off. Since 
the input to the adaptive postfilter is the output of the synthesis filter, this signal is always 
available. In an actual implementation this unfiltercd signal shall be output when the switch is set 
to disable the postfilter. 

4.7 Postfilter Adapter 

This block calculates and updates the coefficients of the postfilter once a frame. This postfilter 
adapter is further expanded in Figure 8/G.728. 

Refer to Figure 8/G.728. The lOth-order LPC inverse filter 8 1 and the pitch period extraction 
module 82 work together to extract the pitch period from the decoded speech. In fact, any pitch 
extractor with reasonable performance (and without introducing additional delay) may be used 
here. What we described here is only one possible way of implementing a pitch extractor. 

The lOth-order LPC inverse titer 81 has a transfer function of 

*M=l-&*-\ (29) 

where the coefficients a*s are supplied by the Lcvinson-Durbin recursion module (block 50 of 
Figure 5/G.728) and are updated at the first vector of each frame. This LPC inverse filter takes the 
decoded speech as its input and produces the LPC prediction residual sequence (</(*)} as its 
output We use a pitch analysis window size of 100 samples and a range of pitch period from 20 
to 140 samples. The pitch period extraction module 82 maintains a long buffer to hold the last 
240 samples of the LPC prediction residual. For indexing convenience, the 240 LPC residual 
samples stored in the buffer are indexed as </(-139),J(-l38),^ J(100). 

The pitch period extraction module 82 extracts the pitch period once a frame, and the pitch 
period is extracted at the third vector of each frame. Therefore, the LPC inverse filter output 
vectors should be stored into the LPC residual buffer in a special order, the LPC residual vector 
corresponding to the fourth vector of the last frame is stored as J (8 1 ). </ (82). ^d (85), the LPC 
residual of the first vector of the current frame is stored as 4(86), d (87). (90), the LPC residual 
of the second vector of the current frame is stored as d(9l\ </(92),_</(95).and the LPC residual of 
the third vector is stored as d(96) % d(9T\^d(\Q0). The samples </(-139),d<-I38)._.,<f(80) are 
simply the previous LPC residual samples arranged in the correct time order. 

Once the LPC residual buffer is ready, the pitch period extraction module 82 works in the 
following way. First, the last 20 samples of the LPC residual buffer {d (81) through 4(100)) are 
lowpass filtered at 1 kHz by a third-order elliptic filter (coefficients given in Annex D) and then 
4:1 decimated (i.e. down-sampled by a factor of 4). This results in 5 lowpass filtered and 
decimated LPC residual samples, denoted 5(21), 5(22), 5(25), which are stored as the last 5 
samples in a decimated LPC residual buffer. Besides these 5 samples, the other 55 samples 

5(-34), 5(-33) 5(20) in the decimated LPC residual buffer are obtained by shifting previous 

frames of decimated LPC residual samples. The i-th correlation of the decimated LPC residual 
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samples are then computed as 

P<0=]boi)S(iw> (30) 

for time lags i = 5, 6, 7 35 (which correspond to pitch periods from 20 to 140 samples). The 

time lag x which gives the largest of the 31 calculated correlation values is then identified. Since 
this time lag x is the lag in the 4:1 decimated residual domain, the corresponding time lag which 
gives the maximum correlation in the original undecimated residual domain should lie between 
4x-3 and 4t+3. To get the original time resolution, we next use the undecimated LPC residual 
buffer to compute the correlation of the undecimated LPC residual 

100 

C(/)=X<x (*)</(*-<*) (31) 

for 7 lags i = 4x-3, 4t-2 4x+3. Out of the 7 time lags, the lag p 0 that gives the largest correlation 

is identified. 

The time lag p Q found this way may turn out to be a multiple of the true fundamental pitch 
period. What we need in the long-term postfilter is the true fundamental pitch period, not any 
multiple of iL Therefore, we need to do more processing to find the fundamental pitch period- We 
make use of the fact that we estimate the pitch period quite frequently — once every 20 speech 
samples. Since the pitch period typically varies between 20 and 140 samples, our frequent pitch 
estimation means that, at the beginning of each talk spun, we will first get the fundamental pitch 
period before the multiple pitch periods have a chance to show up in the correlation peak-picking 
process described above. From there on, we will have a chance to lock on to the fundamental 
pitch period by checking to sec if there is any correlation peak in the neighborhood of the pitch 
period of the previous frame. 

Let p be the pitch period of the previous frame. If the time lag p 0 obtained above is not in the 
neighborhood of p, then we also evaluate equation (31) for i = p-6. p-5._„/;+5,p+6. Out of these 
13 possible time lags, the time lag p x that gives the largest correlation is identified. We then test 
to see if this new lag p x should be used as the output pitch period of the current frame. First, we 
compute 

100 

X</(*)</(*-Po) 

Po = lob^ • «2) 

Y,d(k-po)d{k-p*) 

which is the optimal tap weight of a single-tap pitch predictor with a lag of p 0 samples. The value 
of Po is then clamped between 0 and 1. Next, we also compute 

too 

L</(*)</(*-Pi) 

Pi^loT 5 • (33) 

£<f(*-Pi)</(*-Pi) 

which is the optimal tap weight of a single-tap pitch predictor with a lag of p i samples. The value 
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of is then also clamped between 0 and 1. Then, the output pitch period p of block 82 is given 
by 

fpo ifPiS0.4po 
P= />. iffc>Mfc (34) 



After the pitch period extraction module 82 extracts the pitch period p. the pitch predictor tap 
calculator 83 then calculates the optimal tap weight of a single-tap pitch predictor for the decoded 
speech. The pitch predictor tap calculator 83 and the long-term postfilter 71 share a long buffer of 
decoded speech samples. This buffer contains decoded speech samples jj<-239), J<K-238), 

jX-237) 5</{4), s/SU where s/,\) through sj(5) correspond to the current vector of decoded 

speech. The long-term postfilter 71 uses this buffer as the delay unit of the filter. On the other 
hand, the pitch predictor tap calculator 83 uses this buffer to calculate 



P = 



0 



(35) 



The long-term postfilter coefficient calculator 84 then takes the pitch period p and the pitch 
predictor tap p and calculates the long-term postfilter coefficients b and g t as follows. 



b = 



0 ifp<0.6 
0.1SP if0.6 2SP£l 
0.15 ifp>i 



(36) 



30 



35 



40 



45 



8i = 



l+b 



(37) 



in general, the closer p is to unity, the more periodic the speech waveform is. As can be seen 
in equations (36) and (37), if p < 0/>, which roughly corresponds to unvoiced or transition regions 
of speech, then b-0 and g f = 1, and the long-term postfilter transfer function becomes Ufa) = 1. 
which means the filtering operation of the long-term postfilter is totally disabled. On the other 
hand, if 0.6 ^ P ^ 1, the long-term postfilter is turned on, and the degree of comb filtering is 
determined by p. The more periodic the speech waveform, the more comb filtering is performed. 
Finally, if p > i, then b is limited to 0.15; this is to avoid too much comb filtering. The coefficient 
gt is a scaling factor of the long-term postfilter to ensure that the voiced regions of speech 
waveforms do not get amplified relative to the unvoiced or transition regions. (If g t were hdd 
constant at unity, then after the long-term postfiltering, the voiced regions would be amplified by a 
factor of 1+6 roughly. This would make some consonants, which correspond to unvoiced and 
transition regions, sound unclear or too soft) 

The short-term postfilter coefficient calculator 85 calculates the short-term postfilter 
coefficients 5/s, Vs, and n at the first vector of each frame according to equations (26), (27). and 
(28). 
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-f.S Output PCM Format Conversion 

This block converts the 5 components of the decoded speech vector into 5 corresponding 4- 
law or n-iaw PCM samples and output these 5 PCM samples sequentially at 1 25 us time intervals 
Note thai if the internal linear PCM format has been scaled as described in section 3.1.1. the 
inverse scaling must be performed before conversion toA-law oru-law pcm. 

5. COMPUTATIONAL DETAILS 

This section provides the computational details for each of the LD-CELP encoder and decoder 
elements. Sections 5.1 and 5.2 list the names of coder parameters and internal processing 
variables which will be referred to in later sections. The detailed specification of each block in 
Figure 2/G.728 through Figure 6A3.728 is given in Section 53 through the end of Section 5. To 
encode and decode an input speech vector, the various blocks of the encoder and the decoder are 
executed in an order which roughly follows the sequence from Section 5.3 to the end 

5 J Description of Basic Coder Parameters 

The names of basic coder parameters are defined in Table 1/G.728. fa Table 1/G.728, the first 
column gives the names of coder parameters which will be used in later detailed description of the 
LD-CELP algorithm. If a parameter has been referred to in Section 3 or 4 but was represented by 
a different symbol, that equivalent symbol wiu be given in the second column for easy reference. 
25 Each coder parameter has a fixed value which is determined in the coder design stage. The third 
column shows these fixed parameter values, and the fourth column is a brief description of the 
coder parameters. 
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Table 1/G.728 Basic Coder Parameters of LD-CELP 



Name 


Equivalent 
Symbol 


Value 


Description 


AGCFAC 




0.99 


AGO adaptation speed controlling factor 


r AC 


■\ 

A. 


253/256 


Bandwidth expansion factor of synthesis filter 


FACGP 


X t 


29/32 


Bandwidth expansion factor of log-gain predictor 


DIMXNv 




0.2 


Reciprocal of vector dimension 


I DIM 




5 


Vector dimension (excitation block size) 


GOFF 




32 


Log-gain offset value 


KPDELTA 




6 


Allowed deviation from previous pitch period 


KPNCN 




20 


Minimum pitch period (samples) 


KPMAX 




1 40 


Maximum pitch period (samples) 


LPC 




50 


Synthesis filter order 


LPCLG 




10 


Log-gain predictor order 


LPCW 




10 


Perceptual weighting filter order 


NCWD 




128 


Shape code book size (no. of code vectors) 


NPRSZ 




20 


Frame size (adaptation cycle size in samples) 


NG 




8 


Gain code book size (no. of gain levels) 


NONR 




35 


No. of non-recursive window samples for synthesis filter 


NONRLG 




20 


No. of non-recursive window samples for log-gain predictor 


NONRW 




30 


No. of non-recursive window samples for weighting filter 






LUU 


Pitch analysis window size (samples) 


NUPDATE 




4 


Predictor update period (in terms of vectors) 


PPFTH 




0.6 


Tap threshold for turning off pitch postfilter 


PPFZCF 




0.15 


Pitch postfilter zero controlling factor 


5PFPCF 




0.75 


Short-term postfilter pole controlling factor 


SPFZCF 




0.65 


Short-term postfilter zero controlling factor 


TAPTH 




0.4 


Tap threshold for fundamental pitch replacement 


TTLTF 




0.15 


Spectral tilt compensation controlling factor 


WNCF 




257/256 


White noise correction factor 


WPCF 


Vi 


0.6 


Pole controlling factor of perceptual weighting filter 


WZCF 


Yi 


0.9 


Zero controlling factor of perceptual weighting filter 



52 Description of Internal Variables 

The internal processing variables of LD-CELP are listed in Table 2/G.728, which has a layout 
similar to Table 1X3.728. The second column shows the range of index in each variable array. The 
fourth column gives the recommended initial values of the variables The initial values of some 
arrays are given in Annexes A, B or C It is recommended (although not required) that the 
internal variables be set to their initial values when the encoder or decoder just suns running, or 
whenever a reset of coder states is needed (such is in DCME applications). These initial values 
ensure that there will be no glitches right after start-up or resets. 

Note that some variable arrays can share the same physical memory locations to save memory 
space, although they are given different names in the tables to enhance clarity. 

As mentioned in earlier sections, the processing sequence has a basic adaptation cycle of 4 
speech vectors. The variable ICOUNT is used as the vector index. In other words, ICOUNT = * 
when the encoder or decoder is processing the n-th speech vector in an adaptation cycle. 
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Table 2JGJ2S LD-CELP Internal Processing Variables 



Name 


Array (ndex 
Range 


Equivalent 
Symbol 


IniiLai 
Value 


Description 


A 


I io LPC+ 1 


-I 


i 0 ft 


j yntncsis alter coefficients 


AJL 


t to 3 




Annex D 


i iow pass aiicr ucnominaxor cocti. 


AP 


I to 11 






Short-term postfilter denominator coeff. 


APF 


1 ?« 1 1 

l IO 11 




1.0.0.... 


1 0th -order LPC filter coefficients 


ATMP 


J It) LTLt 1 






Temporary buffer for synthesis filter coeff. 


AWP 


1 IO L^V-.W'T I 






Perceptual weighting filter denominator coeff. 


AWZ 


1 «n I XX^JJ^. 1 
I IO Lrv.W+1 




1 A A 

1 .0.0.... 


Perceptual weighting filter numerator coeff. 


4W7TMP 


I Lu LA. tttI 


- 




Temporary buffer for weighting filter coeff. 


A7 


1 CO 11 




1,0,0 


Short -term postfilter numerator coeff. 


O 

D 


1 




0 


Long-term postfiUer coefficient 


DL 


I IO 4 


- 


Annex D 


1 kHz Iowpass filter numerator coeff. 


ncr 


-.J4 tO 




0.0 — 0 


4:1 decimated LPC prediction residual 


L/ 


- 1 JV to 10U 


d(k) 


0.0.....0 


LPC prediction residual 


CT 


I to IDEM 


<(n) 


0,0„0 


Gain-scaled excitation vector 


rAL v 


I 10 LPC+ 1 


A. 


Annex C 


Synthesis filter B w broadening vector 


FACGPV 


I to LPCLG+ I 


•v i-l 
*i 


AnnexC 


Gain predictor BW broadening vector 




I to NG 




Annex B 


2 times gain levels in gain codebook 


O AIIN 


1 


ot«) 




Excitation gain 


r» 


1 to NG- 1 


4; 


Annex B 


Mid-point between adjacent gain levels 




1 


ft 


I 


Long-term postfiUer scaling factor 


Or 


1 to LPQ.G+1 




1,-1.0.0,— 


log-gain linear predictor coeff. 




I (O Lr"v_LU*l 






temp, array for log-gain linear predictor coeff. 


GQ 


1 to NG 




Annex B 


Gain levels in the gain codebook 


GSQ 


I to NG 




Annex B 


Squares of gain levels in gain codebook 


OPIATE 


1 to LPCLG 


6(/i) 


-32.-32 ,-32 


Memory of the log-gain linear predictor 


O I MF 


1 to 4 




-32.-32.-31-32 


Temporary log-gain buffer 


if 
H 


i to IDEM 


h(n) 


1.0,0.0.0 


Impulse response vector of F{z)W{i) 


ICHAN 










ICOUNT 








Speech vector counter (indexed from 1 to 4) 


IG 




i 




Best 3-bit gain codebook index 


IP 






IPuVTT" 


Address pointer to LPC prediction residual 


IS 




j 




Best 7-bit shape codebook index 


KP 




P 




Pitcb period of the current frame 


KP1 




P 


50 


Pitch period of the previous frame 


PN 


1 to IDEM 


Pin) 




Correlation vector for codebook search 


PTAP 


1 


P 




Pitch predictor tap computed by block 83 


R 


I toNR+1* 






Autocorrelation coefficients 


RC 


ItoNR* 






Reflection coeff.. also as a scratch array 


RCTMP 


ltoLPC 






Temporary buffer for reflection coeff 


REXP 


1 toLPC+l 






Recursive part of autot ,ta, r ini***, syn. filler 


REXPLG 


IloLPCLG+1 






Recursive pan of autocorrelation, log-gain pred. 


R£XPW 


ltoLPCW+1 




0,0_X) 


Recursive part of autocorrelation, weighting filter 



• NR = Max(LPCWiPCLG) > IDEM 
LPINTT = NPWSZ-NFRSZ+IDIM 
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Table Z'G.728 LD-CELP lateral Processing Variables (Continued) 





Array Index 
Range 


Equivalent 
Symbol 


Initial 
Value 


Description 


R T\fP 


I lO 1— TV- +• 1 








Temporary buffer for autocorrelation coetT. 


s 


i rn rnrM 

I lO LU1XVI 


s{n) 


n a n 


Uniform PCM input speech vector 


SB 


1 rn IA^ 




a n n 
U.U, — U 


Buffer for previously quantized speech 


SBLG 


1 to 34 




A A A 


Buffer for previous log-gain 


SBW 


I to 60 




A A A 


Buffer for previous input speech 


SCALE 


I 






Unaltered postfilter scaling factor i 




t 
1 




I 


Lowpass filtered postfiUer scaling factor j 


SD 


1 rn rniK4 


r / If \ 




Decoded speech buffer 


SPF 


1 rn rniM 






Postfiltercd speech vector 


SPFPCFV 


I to 1 1 




Annex C 


Short-term posdilter pole controlling vector 


SPRZCFV 


1 10 1 1 




Annex C 


Short-term poscfilter zero controlling vector 


SO 




J«(* ) 




A- law or p.- law PCM input speech sample 


SU 


1 
1 






Uniform PCM input speech sample 


ST 


Tin rrviu 




0,0 —.0 


Quantized speech vector 


J lAl ~ I. rV. 


i lO LrV- 


n n n 


Synthesis filter memory J 


CTI PTI 
J 1 LTLl 


1 to lU 




A A A 


LPC inverse filter memory 


STLPF 


1 rn 1 




AAA 


1 kHz lowpass filler memory | 


STMP 


1 m d^rnnu 

1 lO ** LL/lm 




A A A 


Buffer for per. wt. filter hybrid window j 


STPFFC^ 


I 10 tu 




A A A 


Short-term postfilter memory, all-zero section 


C TPC1 TT> 


1 n 




0,0_..,0 


Shon-term postfiker memory, all-pole section 


ct rwni 
jUMtIL 


I 






Sum of absolute value of postfiltered speech 




1 






Sum of absolute value of decoded speech 




1 lO IJJ1M 


v(/i) 




Perceptually weighted speech vector 


TARGET 


1 to IDIM 






k ^Mn-iKjfrnai izco ) vv^ targci vector 


TEMP 


1 to IDIM 






scratch array for temporary working space 


TTLTZ 


I 




0 


Short-term posmMter tilt-compensabon coeff. 


WFIR 


1 toLPCW 




0j0„0 


Memory of weighting filter 4, all-zero portion 


wim 


1 toLPCW 




Oj0_j0 


Memory of weighting filter 4. all-pole portion 


WNR 


L to 105 




Annex A 


Window function for synthesis filter 


WNRLG 


iio34 




Annex A 


Window function for log-gain predictor 


WNRW 


1 10 60 




Annex A 


Window function for weighting filler 


WPCFV 


lto LPCW+1 




AnnexC 


Perceptual weighting filler pole controlling vector 


ws 


t to 105 






Work Space array for intermediate variables 


WZCFV 


I to LPCW+1 


AnnexC 


Perceptual weighting filter zero controlling vector 


Y 


1 to IDIM'NCWD 




Annex B 


Shape codebook array 


Y2 


I toNCWD 




Energy of jj 


Energy of convolved shape codevect or 


YN 


1 to IDIM 


y<*> 


Quantized excitation vector 


ZIRWFIR 


1 toLPCW 






Memory of weighting filler 10, all-zero portion 


ZIRWIIR 


I toLPCW 




0.0_jO 


Memory of weighting filler 10, all-pole portion 



It should be noted that for the convergence of Le^insoa-Duxbin recursion, the first element of 
A. ATMP. AWP. AWZ. and GP arrays are always 1 and never get changed, and, for 1 22, the j-th 
elements are the (i -1Mb dements of the corresponding symbols in Section 3. 

In the following sections, the asterisk * denotes arithmetic multiplication. 
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5 J Input PCM Format Conversion (block I ) 
Input: SO 
Output: SU 

Function: Convert A-law oru-law or 16-bic linear input sample to uniform PCM sample. 

Since the operation of this block is completely defined in CCTTT Recommendations G 721 or 
G.71 1, we will not repeat it here. However. recall from section 3.1.1 that some scaling mav be 
necessary to conform to this description's specification of an input range of -4095 to +4095. 



5.4 Vector Buffer (block 2) 
Input: SU 
Output: S 

Function: Buffer 5 consecutive uniform PCM speech samples to form a single 5^iimensional 
speech vector. 



5 J Adapter for Perceptual Weighting Filter (block J. Figure 4 (a)/G.728) 

The three blocks (36. 37 and 38) in Figure 4 (a)/G.728 are now specified in detail below. 
HYBRID WINDOWING MODULE (block 36) 

Input: STMP 
Output R 

Function: Apply the hybrid window to input speech and compute autocorrelation coefficients. 

The operation of this module is now described below, using a "Fortran-like" style, with loop 
boundaries indicated by indentation and comments on the right-hand side of "I The following 
algorithm is to be used once every adaptation cycle (20 samples). The STMP array holds 4 
consecutive inp* speech vectors up to the second speech vector of the current adaptation cycle. 
That is, STMP(1) through STMP(5) is the third input speech vector of the previous adaptation 
cycle (zero initially), STMP(6) through STMP(10) is the fourth input speech vector of the 
previous adaptation cycle (zero initially). STMP(i 1) through STMP(15) is the first input speech 
vector of the current adaptation cycle, and STMP(16) through STMP(20) is the second input 
speech vector of the current adaptation cycle. 
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Nl=LPCW«-NFRSZ , ccrr.puce some constants (can be 

N2=LPCW*KONRW I precompuced and stored in memory) 

N3 = LPCW-NFRSZ*NONRV* 

For Ms 1,2 N2. do the next line 

SBW(N) =S3W(N*NFRSZJ I shift Che old signal buffer; 

For N=l. 2, . . . ,NFRSZ, do the next line 
10 SBW(N2*M)»STMP(W) | shift in the new signal; 

I SBW(N3) is the newest sample 

K=l 

For N=N3.N3-1 3.2,1. do the next 2 lines 

WS(N) =SBW(N) -WNRW(K) | multiply the window function 

15 K=K*1 

For 1 = 1.2 LPCW-1, do the next 4 lines 

TMP=0 . 

For N=LPCW+1, LPCVU2 Nl . do the next line 

TMP=TMP*WS (N) *WS{N*L-r) 
REXPW(I) = (1/2) •REXPW(I)*TMP I update the recursive component 

For 1 = 1.2 LPCW-1. do the next 3 lines 

R(I) =REXPW(I) 

For N=Nl*l,Nl«-2 N3, do the next line 

25 R(I)=R(I)*ws<N)*WS<Nol-l) | add the non-recursive component 

R( I) =R( 1) •WNCF I white noise correction 
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LEVTNSON-DURBIN RECURSION MODULE (block 37) 

Input R (output of block 36) 
Output: AWZTMP 

Function: Convert autocorrelation coefficients to linear predictor coefficients. 

This block is executed once every 4-vector adaptation cycle. It is done at ICOUNT-3 after the 
processing of block 36 has finished. Since the Levinson-Durbin recursion is well-known prior art, 
the algorithm is given below without explanation. 



45 



50 



55 



35 



BNSDOCID: <EP 067301 8A2_I_> 



EP 0 673 018 A2 



10 



15 



20 



25 
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't R{1; SO, go to LA3£L 

P.Z' 1) =-R(2) /R< 1) 

AWZTMP ■ i ) = L . 

AWZTM?'2)=RC(1J 

ALPHA=R(1)-R<2) «RC(1) 

If ALPHA < 0, go to LABEL 



Skip if zero 

Skip if -ero signal. 

First-order predictor 
Abort it ill -cor.citirr.ee 



For MINC=2.3.4 LPCW, do the Allowing 

SUM=0 . 

For :P=1.2.3 MINC. do the next 2 lines 

Nl=MINC-IP«-2 
SUM=SUM*R(KLJ * AWZTMP (IP) 



i Reflection coeff. 



?.C (MINC) = -SUM/ ALPHA 

hh=m:nc/2*i j 

For IP=2«3.4 MH, do the next 4 lines 

IB=MINC-IP*2 

AT=AWZTMP<IP)+RC(MINC) * AWZTMP < IB) | 

AWZTOP(IB)»AW2™P{IB)*RC(MINC)-AWZ™p(IP) | Predictor coeff. 



AWZTMP (IP) = AT 

AWZTMP ( MINO 1 ) =RC (MINC ) 
ALPHA = ALPHA* RC (MINC) *SUM 
If ALPHA SO, go to LABEL 



I Prediction residual energy. 
I Abort if ill-conditicr.ed. 



35 



Repeat the above for the next MINC 

\ Program terminates noraailv 
Exit this program | if execution proceeds to 

I here. 

LASEX: If program proceeds to here, ill -conditioning had happened, 

then, skip blo<rk 38. do noc update the weighting filter coefficients 
(That is, use the weighting filter coefficients of the previous 
adaptation cycle.) 



WEIGHTING FILTER COEFFICIENT CALCULATOR (block 38) 

Input: AWZTMP 
45 Output: AWZ. AWP 

Function: Calculate the perceptual weighting filter coefficients from the linear predictor 
coefficients for input speech. 

50 TOs block is executed once every adaptation cycle. It is done at ICOUNT=3 after the processing 
of block 37 has finished. 
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For 1=2.3 LFCVm-1, do che next line | 

AWP ( : ) =W?CFV ( I ) * AWZTMP < I ) | Denominator coeff. 

— r 1=2.3 LPCW^l, do the next line I 

AWZ ( I ) =«"ZCFV ( I ) - AWZTMP ( I ) | Numerator coeff. 



5.6 Backward Synthesis Filter Adapter (block 23. Figure 5IGJ28) 

The three blocks (49. 50, and 5 1) in Figure 5/G.728 are specified below. 

HYBRID WINDOWING MODULE (block 49) 

Input: STTMP 
Output RTMP 

Function: Apply the hybrid window to quantized speech and compute autocorrelation 

coefficients. 

The operation of this block is essentially the same as in block 36, except for some 
substitutions of parameters and variables, and for the sampling instant when the autocorrelation 
coefficients are obtained. As described in Section 3. the autocorrelation coefficients are computed 
based on the quantized speech vectors up to the last vector in the previous 4-vector adaptation 
cycle. In other words, the autocorrelation coefficients used in the current adaptation cycle are 
based on the information contained in the quantized speech up to the last (20-th) sample of the 
previous adaptation cycle. (This is in fact how we define the adaptation cycle.) The STTMP array 
contains the 4 quantized speech vectors of the previous adaptation cycle. 
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= '_?CNc?.£- . a 

2-LPONONR — --r— e seme constants (can be 

2^PC, N F?3z.«CNH ' ' p "«"*"- scored in memory, 

For >f=l,2, . ..,N2, do the next line 

{ N) =53 { N+NFRSZ ) i s h rh fl , . C£ 

- _ , _ 1 sn *^ Cne old signal buffer- 

rcr N=i - NFRSZ, do che next line 

A * nnr " I shift in che new signal; 

Ksl 1 SB1N3) is che newest sample 

For N.N3.N3-1 3,2,1, do che next 2 lines 

WSW).SB(M)-WNRCK) , multiply che window function 

For r=l ' 2 LPC-1. do the next 4 lines 

TMP = 0 . 

For N=L?C+1.LPC*2 Nl, do che nex: line 

TMPrTMP^WS(N) *WSfN*l-I> 
REXP(I)-(3/4)-REXP(I)*1HP I updace che recursive component 

For I = 1 ' 2 LPOl. do the next 3 lines 

RTMP ( I ) =REXP( I) 

For N=NU1,N1,2 N3 , do the next line 

RTMP ( I ) =R*IMP ( I ) +WS (N) *WS(N*1-I) 

I add che non-recursive component 
RTMP ( 1 ) =RTMP ( 1 ) *WNCF , whic . nolse corr . ccion • 



LEVINSON-DURBIN RECURSION MODULE (block SO) 

Input: RTMP 
Output ATMP 

Function: Convert autocorrelation coefficients to synthesis filter coefficients. 

The operation of this block is exactly the same as in block 37. except for some substitutions of 
parameters and variables. However, special care should be taken when implementing this Nock. 
As described in Section 3. although the autocorrelation RTMP amy is available at the first vector 
of each adaptanon cycle, the actual updates of synthesis filler coefficients will not take place until 
the third vector. This intentional delay of updates allows the real-time hardware to spread the 
computaoon of this module over the first three vectors of each adaptation cycle While this 
module is being executed during the first two vectors of each cycle, the old set of synthesis filter 
coefficients (the array "A") obtained in the previous cycte is stm bemg used. This is why we need 
v 3 se P araIe m y ATMP » avoid overwriting the old "A" array. Similarly. RTMP 
RCTMP. ALP H ATMP. etc. are used to avoid interference to other Levinson-Durbin recursion 
modules (blocks 37 and 44). 
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If RTMP(LPC-i) = 0, go co LABEL I Skip if zero 

I 

If RTMF(l) £ 0, go co LABEL f Skip if zero signal. 

I 

P.CTM? f 1) s-RTMP( 2) /RTMP(l) 

ATMP < 1 ) = 1 . | 

ATMP(2) =RCTMP(1) I First-order predictor 

ALPHATttP=RTMP(l)«.RTMP{2) 'RCTMP ( 1 ) I 

if ALPHATMP £ 0. go to LABEL I Abort if ill-conditioned 

For MINC = 2,3,4 LPC, do the following 

SUM=0 . 

For IP«i.2.3 MINC, do Che nexc 2 lines 

Nl=MINC-IP+2 

SUM = SUM*RTMP(N1) * ATMP (IP) 

I 

20 P.CTMP(MINC) =- SUM/ ALPHATMP I Ref lection coef f . 

MH=MINC/2<»1 | 

For IP=2,3,4 MH. do che nexc 4 lines 

I3=MINC-IP*2 

AT=ATMP ( IP) ♦RCTMP (MINC) *ATMP ( IB) I 

ATMP ( IB )= ATMP ( IB )*RCTMP{ MINC) 'ATOP (IP) I Update predictor coeff, 
ATMP(IP)=AT | 



25 



ATMP (MINC + 1 ) =RCTMP (MINC ) j 

ALPHATMP=ALPKATMP*RCTMP (MINC ) -SUM I Pred. residual energy. 
If ALPHATMP £ 0, go co LABEL I Abort if ill-condicioned . 

30 I 
Repeat the above for the next MINC 

I Recursion completed normally 
Exit this program I if execution proceeds to 

I here. 

LABEL: If program proceeds to here, ill -conditioning had happened, 
35 chen, skip block 51, do not update the synthesis filter coefficients 

(Thac is, use the synthesis filter coefficients of the previous 
adaptation cycle.) 



40 BANDWIDTH EXPANSION MODULE (block 51) 

Input ATMP 
Output: A 

45 

Function: Scale synthesis filter coefficients to expand the band widths of spectral peaks. 

This block is executed only once every adaptation cycle. It is done after the processing of block 
50 has finished and before the execution of blocks 9 and 10 at ICOUNT*3 take place. When the 
so execution of this module is finished and IC0UNT=3. then we copy the ATMP array to the "A" 
amy to update the ftlcer coefficients. 
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: - : 1-2.3 L?C-1, do che nexc line 

ATMPf :;■ =-ACVC) *A7MP { I ) 

Wilt undi :C0WT=3. chen 

:~r :=Z.3 L?C*1, do che next line 

AC) =ATMP(I) 



I scale cceff , 



I 



I Update coeff. at che ch 
I vector of each cycle. 



5.7 Backward Vector Gain Adapter (block 20. Figure 6/GJ28) 

h ™ C "f* Z RgUre 6/G 728 " specitel telow For implementation efficiency some 
blocks are described together as a single block (they are shown separately in figure 6/^728^ 

* c conc ^>' AO blocks in Figure 6^.728 are executed once ever^ speS 
except for blocks 43. 44 and 45. which are executed only when ICOUM-2. 

l.VECTOR DELAY, RMS CALCULATOR, AND LOGARITHM CALCULATOR 

(blocks 67, 39, and 40) 

Input: ET 
Output: ETRMS 

Function: Calculate the <JB level of the Root-Mean Square (RMS) value of the previous gain- 
scaled excitauon vector. 6 

When these three blocks arc executed (which is before the VQ codebook search), the ET array 
contains the gain-scaled excitation vector determined for the previous speech vector. Therefore 
the 1-vector delay unit (block 67) is automatically executed. (It appears in Figure 6*3.728 just to 
enhance clarity.) Since the logarithm calculator immediately follow the RMS calculator the 
square root operation in the RMS calculator can be implemented as a -divide-by-two- operation to 
the output of the logarithm calculator. Hence, the output of the logarithm calculator (the dB 
value) is 10 * logjo ( energy of ET / IDIM ). To avoid overflow of logarithm value when ET = 0 
(after system initialization or reset), the argument of the logarithm operation is clipped to i if it is 
too smalL Also, we note that ETRMS is usually kept in an accumulator, as it is a temporary value 
which is immediately processed in block 42. 



ETRMS = ET£1) *ET(1) 

For K=2,3 IDIM, do the next line 

ETRMS * ETRMS ♦ ET(K) *ET (K) 

ETRMS = ETRMS "DIMINV 

If ETRMS < X., sec ETRMS = L. 

ETRMS = 10 * log i0 (ETRMS ) 



I Compute energy of ET. 

1 

I Divide by IDIM. 
I Clip to avoid log overflow 
I Compute dB value. 
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LOG-GAIN OFFSET SUBTRACTOR (block 42) 

Input: ETRMS. GOFF 
Output: GSTATEO) 

Function: Subtract the log-gain offset value held in block 41 from the output of block 40 (dB 
gain level). 

G STATE ( 1 ) = ETRMS - GOFF 



HYBRID WINDOWING MODULE (block 43) 

Input: GTMP 
Output R 

Function: Apply the hybrid window to offset-subtracted log-gain sequence and compute 
autocorrelation coefficients. 

The operation of this block is very similar to block 36, except for some substitutions of 
parameters and variable's, and for the sampling instant when the autocorrelation coefficients are 
obtained 

An important difference between block 36 and this block is that only 4 (rather than 20) gain 
sample is fed to this block each time the block is executed. 

The log-gain predictor coefficients are updated at the second vector of each adaptation cycle. 
The GTMP amy below contains 4 offset-removed log-gain values, starting from the log-gain of 
the second vector of the previous adaptation cycle to the log-gain of the first vector of the current 
adaptation cycle, which is GTMP(1). GTMP(4) is the ofifcet-removed log-gain value from the first 
vector of the anient adaptation cycle, the newest value. 
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v.- ro/-T^ ff^vrr,T - scme constants i can - e 

SSSSSS;,.™^, ' -«~ - in - J 

For M=1.2 N2. do the next -line 

For NUPDATE. do che n.xc line buffer, 

K _j 1 SBLG(N3) is che newesc sample 

Fcr N=N3.N3-1 3.2.1. do che next 2 lines 

WS(M).S8LO(M)-WMRLC(K) | multiply the window funccion 

For r=1 ' 2 LPCLG-l. do the nexc 4 lines 

TMP = 0 . 

For N.LPCLG.l.LPCLG-2 Nl. do che next line 

TMP=TMP-WS(N) *WS (N*l- I ) 
REXPLC(I)=(3/4)- REX PLG(r).-mP , updat « cho recursive compotMSnc 

For 1=1 -2 LPCLG-1. do the next 3 lines 

R( I) =REXPLG( I J 

For N=N1+1.N1*2 N3. do the next line 

R<I)=R<I)*WS(N)-WS(N*1-I) | add the non-recursive component 

R(1)= R (1,-WNCF , whlte noise correccion 



LEVINSON-DURBIN RECURSION MODULE (block 44) 

Input: R (output of block 43) 
Output GPTMP 

Function: Convert autocorrelation coefficients to log-gain predictor coefficients. 

The operation of tna block is exactly the same as in block 37. except for the substitutions of 
P?™** 0 enables indicated below: replace LPCW by LPCLG and AWZ by GP. This 
block is exaoatd only when ICOUNT-2. after block 43 is executed. Note that as the fiist steo 
ihe value of ROPCLG+1) will be checked. If it is zero, we skip blocks 44 and 45 wittout 
updating the log-gain predictor coefficients. (That is. we keep using the old log-gain predictor 
coefficients determined in the previous adaptation cycle.) This special procedure is designed to 
avotd a very small glitch that would have otherwise happened right after system initialization or 
reset In case the matrix is ill-conditioned, we also skip block 45 and use the old values, 

BANDWIDTH EXPANSION MODULE (block 45) 

Input: GPTMP 
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Output: GP 

Function: Scale log-gain predictor coefficients to expand the bandwidths of spectral peaks. 
This block is executed only when ICOUNT-2. after block 44 is executed. 

For 1=2,3 LPCLC+1, do the next line I 

CP(I)=FACCPV(I) •GPTMP(I) | scale coeff. 



LOG-GAIN LINEAR PREDICTOR (block 46) 

Input: GP, GSTATE 
Output; GAIN 

Function: Predict the current value of the offset-subtracted log-gain. 

GAIN = 0. 

For I=LGLPC f LPCLG-1 3.2, do the next 2 line* 

GAIN = GAIN - GP( 1*1) •GSTATE (I) 
GSTATE { I ) = GSTATE ( 1-1) 

GAIN = GAIN - GP ( 2 ) *GSTATE ( 1 ) 



LOG-GAIN OFFSET ADDER (between blocks 46 and 47) 

Input: GAIN,GOFF 
Output: GAIN 

Function: Adt fee log-gam offset value back to the log-gain predictor output. 

GAIN * GAIN ♦ GOFF 



LOG-GAIN UMTTER (block 47) 

Input GAIN 
Output GAIN 

Function: Limit the range of the predicted logarithmic gain. 
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If GAI.'J < 0 . . se: GAIN = 0. i Correspond :c linear gain 1. 

If GAIN > 60., set GAIN' = £C . I Correspond :o linear gain 1000. 



t0 INVERSE LOGARITHM CALCULATOR (block 48) 

Input: GAIN 
Output GAIN 

Function: Convert the predicted logarithmic gain (in dB) back to linear domain. 

GAIN = 10 < C " A " 20 > 



15 



20 



25 



5.8 Perceptual Weighting Filter 

PERCEPTUAL WEIGHTING FILTER (block 4) 



35 



Input: S. AWZ, AWP 
Output SW 

30 Function: Filter the input speech vector to achieve perceptual weighting. 

For K=1.2 IDIM, do the following 

SW(K) = S(K} 

For J=rLPCW, LPCW-1 3,2, do the next 2 lines 

SW(K) = SW(K) ♦ WFIR(J) -AWZ(J>1) | All-zero part 

WFIR(J) » WFIR(J-l) | o£ Che filter. 

SW(K),« SW{K) ♦ WFIR(1)*AW2(2) I Handle last one 

WFIR(l) . S(K) , differently. 

~. l*r J«U»CW,LPCW-1 3,2, do the next 2 lines 

SW(K)«SW(K)-WXIR(J) »AWP(J^l) | All-pole part 

WIIR(J>*W1IR(J-1) I of the filter. 

SW<K)-SW(K)-W1IR<1) *AWP(2) | Handle last one 

WIIR(1)«SW(K) | differently. 

Repeat the above for the next K 
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5.9 Computation of Zero -Input Response Vector 



Section 3.5 explains how a "zero-input response vector" r{n) is computed by blocks 9 and 10. 
Now the operation of these two blocks during this phase is specified below. Their operation 
during the "memory update phase" will be described later. 

SYNTHESIS FILTER (block 9) DURING ZERO-INPUT RESPONSE COMPUTATION 

Input: A. STATELPC 
Output TEMP 

Function: Compute the zero-input response vector of the synthesis filter. 

For K = l,2 IDIM. do the following 

TEMP(K) =0 . 

For J-LPC,LPC-1, ...,3,2, do the next 2 lines 

TQ4P(K) =TEWP(K) -STATELPC (J) •MJ+1) t Multiply-add. 

STATELPC (J) =STATELPC( J- 1) | Memory shift. 

TEMP<K)=TEMP(K)-STATELPC(1)*A<2) | Handle last one 

STATELPC (1)=TEMP(K) ) differently. 

Repeat the above for the next K 



PERCEPTUAL WEIGHTING FILTER DURING ZERO-INPUT RESPONSE COMPUTATION 

(block 10) 

Input: AWZ, AWP, ZIRWFIR, ZIRWHR, TEMP computed above 
Output ZIR 

Fuacatflqnipme the zero-input response vector of the p»«™T^i«J weighting fii'y 
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ror K=i,2 ICIM, the following 

TM? - TEMP CO 

For JsLFCV, LPOT-1 3,2.. do che next 2 lines 

TEMP(K) = TEMP(K) ♦ ZIRWFIR(J) *AWZ(J*1) 
ZIRWFIR(J) = ZIRWFIR(J-l) 

7E24P(K) = TEMP ( K) ♦ ZIRWFIR < 1 ) -AWZ ( 2 ) 
ZIRWFIR(l) = TMP 



I All-zero pare 
I of che filter. 

I Handle last one 



15 



For J = L?CW, LPCW-1 3,2. do the next 2 lines 

TEMP (K J =TEKP(K) -ZIRWIIR(J) 'AWFCJ+i) 
ZIRWIIR(J) =ZIRWHR(J-1) 

ZIR(K) -TEMP(K) -ZIRWIIR(l) *AWP<2) 
ZIRWIIR{ 1) =ZIR(KJ 



20 



Repeat the above for the next K 



I All-pcle part 
I of the filter. 

I Handle last one 
I differently. 
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35 



40 



5.10 VQ Target Vector Computation 

VQ TARGET VECTOR COMPUTATION (block 11) 

Input: SW. ZIR 
Output TARGET 

Function: Subtract the zero-input response vector from the weighted speech vector. 

Note: ZlR{Kr=ZIRWUR{tD(M + \-K) from block 10 above. It does not require a separate storage 
location. 

For K«1.2 IDIM, do th« next lin« 

TARGET (K) = SW(K) - ZIR(K) 



5J1 Codebook Scorch Module (block 24) 



The 7 blocks contained within the codebook search module (block 24) are specified below. 
Again, some blocks are described as a single block for convenience and implementation 
45 efficiency. Blocks 12. 14, and 15 are executed once every adaptation cycle when ICOUNT=3. 
while the other blocks are executed once every speech vector. 

IMPULSE RESPONSE VECTOR CALCULATOR (block 12) 

50 



55 



46 



ENS DOC ID: <EP OG73018A2J_> 



1 



EP0 673 018 A2 



Lnput: A. AWZ. AWP 
Output: H 

Function: Compute the impulse response vector of the cascaded synthesis filter and perceptual 
weighting filter. 

This block is executed when ICOUNT=3 and after the execution of block 23 and 3 is completed 
(i.e.. when the new sets of A. AWZ, AWP coefficients are ready). 



15 TEMP ( 1 ) =1 . I TEMP = synthesis filter memory 

RC(L)=1. I RC = W(z) all-pole pare memory 

For K=2.3 IDIM. do che following 

A0=0 . 

A1=0 . 

A2 = 0 . 

20 For I=K,K-1 3,2. do Che next 5 lines 

TEMP ( I) =TEMP(I-1) 

RC ( I ) =RC ( I -1 ) I 
AO = AO -A ( I ) *TEMP ( I ) 1 Filtering. 

A1=A1+AW2( I) *TQ*P<I) I 
A2=A2-AWP< I) *RC(I) 



25 



30 



TEMP(l) =AQ 
RC ( 1 ) =A0+A1*A2 
Repeat che above indented section for the next K 

ITMP=IDIM*1 I Obtain h(n) by reversing 

For K=l,2 IDIM, do the next line 1 che order of the memory of 

H{K)=RC<rmP-K) I all-pole section of W(z) 



SHAPE CODEVECTOR CONVOLUTION MODULE AND ENERGY TABLE CALCULATOR 

(blocks 14 and 15) 



Lnput; rL Y 

40 Output; Y2 

Function; Convolve each shape code vector with the impulse response obtained in block 12, 
then compute and store the energy of the resulting vector. 

This block is also executed when ICOUNT=3 after the execution of block 12 is completed. 

45 
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For J=l,2 NCWC. do the following | One codevectcr pe- 

JI= ( J-l) -IDIM * 
For K=i,2 IDIM, do the nexc lines 

THMP(K) =0 . 

For Isl,2,....K, do che nexc iine I 
10 TEM?{K) s TE^P{K).H(I)'Y(k:-I) , Convolution. 

Repeat che above 4 lines for che nexc K 

Y2{ J)=0. 

For K = 1.2 IDIM. dc :ha nexc line I 

Y2(J)=Y2(J)+TE>!P{K)*TEMP(K) I Compute energy. 
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Repeat che above for Che next J 



VQ TARGET VECTOR NORMALIZATION (block l€\ 

Input: TARGET. GAIN 
Output: TARGET 

Function: Normalize the VQ target vector using the predicted excitation gain. 

TMP si./ GAIN 

For Ksl.2 IDIM, do che next line 

TAKGET(K) = TARGET ( K ) • TMP 



TIME-REVERSED CONVOLUTION MODULE (block 13) 

Input: H, TARGET (output from block 16) 
Output PN 

Function: Perform time-reversed convolution of the impulse response vector and the 
normalized VQ target vector (to obtain the vector p (n )). 

Note: The vector PN can be kept in temporary storage. 

For K=l,2 IDIM, do the following 

45 K1=K-1 

PN(K)=0. 

For J=K,K+1 IDIM, do the next line 

PN(K) =PN(K) ♦TARGET{J) *H< J-Kl) 



Repeat the above for the next K 
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ERROR CALCULATOR AND BEST CODEBOOK INDEX SELECTOR (blocks 17 and 18) 



tnput: PN. Y. Y2. GB. G2, GSQ 
Output: IG. IS. ICHAN 

™ Function: Search through ihc gain codebook and the shape codebook to identify the best 

combination of gain codebook index and shape codebook index, and combine the two to obtain 
the 1 0-bit best codebook index. 

Notes: The variable COR used below is usually kept in an accumulator, rather than storing it in 
memory. The variables IDXG and J can be kept in temporary registers, while IG and IS can be 
kept in memory. 

Initialize DISTM Co the largest number represencable in Che hardware 
20 Nl=NG/2 

For J = 1.2 NCWD. do the following 

J1MJ-1) *IDIM 
COR=0 . 

For K = 1.2 IDIM. do the next line \ 

25 COR=COFUPN(K) -Y(J1*K) | Compute inner product Pj . 

If CCR > 0., chen do the next 5 lines 
IDXG=N1 

For K=l,2, . ...Nl-1, do the next "if statement 
If COR < CB(K)*Y2(J). do the next 2 lines 
30 IDXG=K I Best positive gain found. 

GO TO LABEL 

If COR S 0 . , then do the next 5 lines 
IDXC=NG 

35 for K=NW1,N1*2 NG-1, do the next 'if statement 

If COR > GB(K)*Y2(J), do the next 2 lines) 

IDXGsK | Best negative gain found. 

GO TO LABEL 

LABEL: D--G2 ( IDXG) *C0R<M3SQ ( IDXG) • Y2 ( J) I Compute distortion 5. 

40 

If D < DISTM. do the next 3 lines 

DISTM=D | Save the lowest distortion 

IG = IDXG I and the best codebook 

IS=J I indices so fax. 

45 

Repeat the above indented section Cor the next J 

ICHAN ={IS-1)*NG*(IG-1) I Concatenate shape and gain 

I codebook indices. 

50 Transmit ICHAN through communication channel. 

For serial bit stream transmission, the most significant bit of ICHAN should be transmitted first. 
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If ICHAN is represented by the 10 bit word b 9 b l b 1 b € b s b A b^b 1 b x b^ then the order of the 



5.12 Simulated Decoder (block 8) 

Blocks 20 and 23 have been described earlier. Blocks 19. 21. and 22 are specified below. 
EXCITATION VQ CODEBOOK (block 19) 

Input: IG. IS 
Output: YN 

Function: Perform table look-up to extract the best shape codevecior and the best gain, then 
multiply them to get the quantized excitation vector. 

NN = ( IS-1) -IDIM 

For K=l,2 IDIM, do the next line 

YN { K ) = GQ ( IG) • Y(NN-KJ 

GAIN SCALING UNIT (block 21) 

Input: GAIN. YN 
Output: ET 

Function: multiply the quannzed excitation vector by the excitation gain. 

For K*l,2 ,IDIM, do the next line 

ET(K) = GAIN * YN(K) 



SYNTHESIS FILTER (block 22) 

Input: ET. A 
Output: ST 

Function: Filter the gain-scaled excitation vector to obtain the quantized speech vector 
As explained in Section 3. this block can be omitted and the quantized speech vector can be 
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obtained as a by-product of the memory update procedure to be described below. If, however, one 
wishes to implement this block anyway, a separate set of filter memory (rather than STATELPC) 
should be used for this ail-pole synthesis filter. 

5. J 3 Filler Memory Update for Blocks 9 and 10 

The following description of the filter memory update procedures for blocks 9 and 10 assumes 
that the quantized speech vector ST is obtained as a by-product of the memory updates. To 
safeguard possible overloading of signal levels, a magnitude limiter is built into the procedure so 
that the filter memory clips at MAX and MIN, where MAX and MIN are respectively the positive 
and negative saturation levels of A-law or ji-law PCM, depending on which law is used. 

FILTER MEMORY UPDATE (blocks 9 and 10) 

Input: ET, A. AWZ AWP. STATELPC, ZIRWFIR, ZIRWIIR 
Output: ST. STATELPC, ZERWFIR, ZIRWIIR 

Function: Update the filter memory of blocks 9 and 10 and also obtain the quantized speech 

vector. 
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15 



::rkfir{1) =£T( i) 

r£MP(l) =ET( 1) 

For K=2,3 IDIM. 

A0=£T{K) 

A1 = 0. 



I ZIRWFIR now a scratch ar:av 



do Che f o 1 low i r.g 



A2 = 0. 

For I=K.K-1 2, do the next 

ZIP.WFIR{ I ) =2IRWF IR( I- I) 
TEMP(I) =TEMP(I-1) 
AQ=A0-A(I) *ZIRWFIR<I) 
Al =A1+AWZ { I ) *ZIRWFIR [ I ) 
A2=A2-AWP<I) •TEMP (I) 

ZIRWFIR(l) =A0 
TEMP{ 1) =A0*A1*A2 



5 lines 



Compute :ero-state responses 
at various stages of the 
cascaded filter . 



20 



25 



Repeat the above indented section Eor the next K 



( Now update filter memory by adding 
I zero-state responses to zero-input 
I responses 



For K=l,2 IDIM, do the next 4 lines 

STAT EX. PC ( K) sSTATELPC ( K ) *2IRWFIR ( K) 
If STATELPC<K) > MAX. set STAT EL PC (K) =MAX 
If STATELPC ( K ) < MIN, sec STATELPC ( K ) =MIN 
ZIRWIIR(K) sZIRWIIR(K) -TEMP(K) 



Limit the range. 



30 



For Is 1,2 LPCW. do the next line 

ZIRWFIR ( I ) =STATELPC ( I ) 



I Now set ZIRWFIR to the 
I right value. 



35 



I = IDIM«.l 

For K=l,2 IDIM, do the next line 

ST ( K J =STATELPC (I-K) 



I Obtain quantized speech by 
I reversing order of synthesis 
I filter memory. 



5.14 Decoder (Figure 3tG.728) 

The blocks in the decoder (Figure 3/G.728) are described below. Except for die output PCM 
format conversion block, all otter Mocks art exactly trie same as the blocks in the simulated 
decoder (block 8) in Figure 2/G.728. 

The decoder only uses a subset of the variables in Table Z/G.728. If a decoder and an encoder 
art to be implemented in a single DSP chip, then the decoder variables should be given different 
names to avoid overwriting the variables used in the simulated decoder block of the encodes For 
example, to name the decoder variables, we can add a prefix "d" to the corresponding variable 
names in Table 2/G.728. If a decoder is to be implemented as a stand-alone unit independent of 
an encoder, then there is no need to change the variable names. 
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The following description assumes a stand -alone decoder. Again, the blocks are executed in 
the same order they are described below. 

DECODER BACKWARD SYNTHESIS FILTER ADAPTER (block 33) 

Input: ST 
Ourpuu A 

Function: Generate synthesis filter coefficients periodically from previously decoded speech. 
The operation of this block is exactly the same as block 23 of the encoder 



DECODER BACKWARD VECTOR GAIN ADAPTER (block 30) 

Input: ET 
Output: GAIN 

Function: Generate the excitation gain from previous gain-scaled excitation vectors. 
The operation of this block is exactly the same as block 20 of the encoder. 



DECODER EXCITATION VQ CODEBOOK (block 29) 

Input: [CHAN 
Output YN 

Function: Decode the received best codebook index (channel index) to obtain the excitation 

vector. 

This block first extracts the 3-bit gain codebook index IG and the 7-bit shape codebook index IS 
from the received 10-bit channel index. Then, the rest of the operation is exactly the same as 

block 19 of the encoder: 
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ITMP = integer pare of ( I CHAN / HGJ | Decode ("S- T ) 

IG x ICKAN - ITMP NG + 1 , Decode ^ 

NN = ITMP ■ XDIM 

Fcr K= • . 2 IDIM, do the next line 

YNiK) = GQ(IG) * Y(NN*K) 



DECODER GAIN SCALING UNIT (block 31) 

* 5 Lnput: GAIN, yn 

Output: ET 

Function: Multiply the excitation vector by the excitation gajn. 
The operation of this block is exactly the same as block 2 1 of the encoder. 



DECODER SYNTHESIS FILTER (block 32) 



Input: ET. A. STATELPC 
30 Output ST 

Function: Filter the gain-scaled excitation vector to obtain the decoded speech vector. 

This block can be implemented as a straightforward all-pole filter However, as mentioned in 
35 Section 4.3. if the encoder obtains the quantized speech as a by-product of filter memory update 
(to save computation), and if potential accumulation of roundoff error is a concern, then this 
block should compute the decoded speech in exactly the same way as in the ^nht^ decoder 
block of the encodec That is, the decoded speech vector should be computed as the sum of the 
40 z«n>-input response vector and the zero-state response vector of the synthesis filter. This can be 
done by the following procedure. 
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5 --° r K =--2 IDIM. tr.e r.exc 7 lines 

TEMP<K) =0 . 

For J = LPC. LPC-L 3.:. do che next 2 Lines 

TEMP(K)«TEMP(K)-STATa.PC(J)^CJ»l) - I Zerc-inpuc response 

STATELPC(J)=STATEL?C<J-l) r 

" TE21P(K)=TEMP(K)-STATEL?C(U-A(2) | Handle last one 

STATELPC I 1>. TEMP CKJ , differently. 

Repeat the above for the next K 
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20 



25 



30 



TEMP ( I) =ET(1) 

For K = 2, 3 IDIM. do the r.exc 5 lines 

AO =ET(K ) 

For I = K,K-1 2, do che next 2 lines 

TEMPO =TEMP( I - 1 ) 

A0=AO-A(I)*TEM?<I) , compute zero-state response 

TEMP(1)=A0 
Repeat the above 5 lines for the next K 

I Now updat* filter memory by adding 
I zero-state responses to zero-inpuc 
I responses 

For Ksl # 2 IDIM, do the n*xc 3 lines 

STATELPC<K)=STATELPC(K)-7H2*P<KJ I ZIR + ZSR 

If STATELPC (K) > MAX, se: STATELPC ( K) =MAX I Limit the range 

If STATELPC (K) < MIN. sec STATELPC ( K ) sMIN I 

I=IDIM*1 

For K=1.2, . . ., IDIM. do the next line | Obtain quantized speech by 

ST(K)=STATELPC(I-K) | reversing order of synthesis 

I filter memory. 



35 



10th -ORDER LPC INVERSE FILTER (block 81) 

40 This block is executed once a vector, and the output vector is written sequentially into the last 20 
samples of the LPC prediction residual buffer (i.e. D(81) through D(100)). We use a poixxer IP to 
point to the address of D(K) array samples to be written to. This pointer IP is initialized to 
NPWSZ-NFRSZ+IDIM before this block starts to process the first decoded speech vector of the 
first adaptation cycle (frame), and from there on IP is updated in the way described below. The 

45 10ih-order LPC predictor coefficients APRD's are obtained in the middle of Levinsoo-Durbin 
recursion by block 50. as described in Section 4.6. It is assumed that before this block starts 
execution, the decoder synthesis filter (block 32 of Figure 3/G.728) has already written the current 
decoded speech vector into ST( I) through ST(IDIM). 
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TMP=0 

For N=l,2 NPWSZ/ 4, do che nex: line 

TMP=TMF*DEC (N) *dec (N- J ) ! TMP = correlacioa in decimated d-sa- 
If TMP > CORMAX, do che next 2 lines 

CORKAX=TMP \ find maximum correlation and 

KMAX=J I the corresponding lag. 

For Ns-M2*l.-M2*2 ( NPWSZ -NFRSZ J /4 ( do the next line 

DEC(N)=DEC(N+IDIM) | shift decimated LPC residual buffer. 

Ml=4-KMAX-3 I start correlation peak-picking in undecimated coma- 

M2=4*KMAX+3 

If Ml < KPMIN, set Ml = KPMIN. I check whether Ml out of range. 
If M2 > kpmax, set M2 = KPMAX. I check whether M2 out of range. 
CORMAX = most negative number of the machine 

20 For J=Ml.Mi + l M2, do the next 6 lines 

TMP=0. 

For K=l,2, NPWSZ; do the next line 

TMP=TMP+D(K) *D(K-J) I correlation in undec ima ted domain . 

If TMP > CORMAX, do the next 2 lines 
25 CORMAX =TMP | find maximum correlation and 

KP=J " I the corresponding lag. 

Ml = KPl - KPDELTA | determine the range of search around 

M2 = KPl + KPDELTA | the pitch period of previous frame. 

30 If KP < M2+1. go to LABEL. | KP can't be a multiple pitch if true 

If Ml < KPMIN, set Ml = KPMIN. I check whether Ml out of range. 
CMAX = most negative number of the machine 

For JsMl.Ml+l M2, do the next 6 lines 

TMP=0. 

For K=l,2, . . . , NPWSZ, do the next line 

TWP=IWP+D(K) •D(K-J) I correlation in undecimated domain. 

If TMP > CMAX, do the next 2 lines 

CMAX=TMP l find maximum correlation and 

40 KPTMP=J I the corresponding lag. 

SUM=0. 

TWP=0 - I start computing the tap weights 

For K=l,2 NPWSZ, do the next 2 lines 

SUM = SUM «• D(K-KP) -D(K-KP) 
45 TMP = TMP + D (K-KPTMP) *D(K-KPTMP) 

If SUM=0, set TAP=0; otherwise, set TAP =CORMAX / S UM . 

If TMP=0. set TAP1=0; otherwise, set TAP1=CMAX/TMP. 

If TAP > 1, set TAP ■ 1. I clamp TAP between 0 and 1 

If TAP < 0, set TAP s 0. 
50 If TAP1 > i, set TAPl = 1. t clamp TAP1 between 0 and 1 
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Input: ST. APF 
Output; D 

Function: Compute the LPC prediction residual for the current decoded speech vector. 



re I? = MFWSZ. then sec IP = NPWSZ - NFRSZ 

For K=l, 2 ZZZX. do the next 7 lines 

ITMP=IP*K 

3( ITMP) = ST { K ) 

For J=10,? 3,2. do the next 2 lines 

D(ITMP) = D(ITMP) * STLPCI ( J) *APF( J*l) 
STLPCI(J) = STLPCKJ-l) 

D(ITMP) = D(ITMP) + STLPCKl) «APF(2) 

S7LPCI { 1 J = ST(K) 



IP 



IP * IDIM 



I check & update IP 



I FIR filtering. 

I Memory shift . 

I Handle last one. 

I shift in input. 



update IP. 



PITCH PERIOD EXTRACTION MODULE (block 82) 

25 

This block is executed once a frame ai the third vector of each frame, after the third decoded 
speech vector is generated. 



30 



35 



40 



45 



50 



Input: D 
Output KP 

Function: Extract the pitch period from the LPC prediction residual 

- If ICOUNT * 3, skip che execution of this block; 
Otherwise, do the following. 

I lowpass filtering & 4:1 downsampling. 
For K=NFWSZ-NFRSZ+1 NPWSZ, do the next 7 lines 

™P=D(K)-STLPF(1)-AL(1)-STLPF(2)-AL(2)-STLPF(3)*AL(3> I IIR filter 
If K is divisible by 4, do the next 2 lines 

N=K/4 | do FIR filtering only if needed. 

DEC(N)=TMP*BL(1) ♦STLPF(l) *BL (2 ) +STLPF ( 2 ) *BL ( 3 ) +STLPF ( 3 ) • BL ( 4 ) 
STLPF(3) =STLPF(2) 

STLPF(2)=STLPF(1) I shift lowpass filter memory. 

STLPFU) =TMP 

Ml a KPMIN/4 | start correlation peak-picking :r. 

M2 = KPMAX/4 | the decimated LPC residual domair. . 

CORMAX = most negative number of the machine 
For J=M1.M1*1 M2. do che next 6 lines 
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If TAP1 < 0, set TA?i = 0. 

I Replace KP with fundamental pitch if 
I TAP1 is large enough. 
If TAP1 > 7AFTH * TAP . then set KP = KPTMP. 

LABEL: KP1 = KP | update pitch period of previous rr^r.e 

For K=-KPHAX + 1, -KP!*AX*2 NPWSZ-NFRSZ, do the next line 

D(K) = D ( K+NFRSZ ) I shift the LPC residual buffer 



PITCH PREDICTOR TAP CALCULATOR (block S3) 

This block is also executed once a frame at the third vector of each frame, right after the execution 
of block 82. This block shares the decoded speech buffer (ST(K) array) with the long-term 
postfilter 71. which takes care of the shifting of the array such that ST(l) through ST(EDIM) 
constitute the current vector of decoded speech, and ST(-KPMAX-NPWSZ+1) through ST(0) are 
previous vectors of decoded speech. 

Input: ST. KP 
Output: PTAP 

Function: Calculate the optimal tap weight of the single-tap pitch predictor of the decoded 

speech. 

If ICOUNT * 3, skip the execution of this block; 
Otherwise, do the following. 

SUM=0. 

TMP=0. 

For K=-NPWSZ«>1. -NFWSZ+2 0, do the next 2 lines 

SUM = SUM ♦ ST(K-KP) *ST(K-KP) 

TMP = TMP ♦ ST(K) W ST(K-KP) 
If SUM=0, sec PTAP=0; otherwise, set PTAP = IMP /SUM. 



LONG-TERM POSTFILTER COEFFICIENT CALCULATOR (block 84) 

This block is also executed once a frame at the third vector of each frame, right after the execution 
of block 83. 

45 Input: PTAP 

Output: B, GL 

Function: Calculate the coefficients and the scaling factor g, of the long-term postfilter. 

50 
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5 If ICOUNT * 3, skip Che execution of this block; 

Cchervise, do Che following. 

If PTAP > 1. sec PTAP =1. , clam p p TAP at t- 

If PTAP < PPFTH. set PTAP = 0. | turn off pitch postfilcer if 

„„„ 1 ?TAP smaller ch *n threshold. 
B = PPFZCF ■ PTAP 

10 GL = 1 / (l*B) 
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SHORT-TERM POST FILTER COEFFICIENT CALCULATOR (block 85) 

This block is also executed once a frame, buc it is executed ai the first vector of each frame. 

Input: APF.RCTMPO) 
Output: AP, AZ, TTLTZ 

Function: Calculate the coefficients of the short-term postfilter. 

If ICOUNT * 1. skip the execution o£ this block; 
Otherwise. do the following. 

For 1=2.3 11, do the next 2 lines I 

AP(I)=SPFPCFV<I)* A PF<I; | scale denominator coeff. 

AZ(I)=SPFZCFV(l)-APF(I) | scale numerator coeff . 

TILTZ=TILTF*RCTMP < 1 ) , cilc compensation filter coef 



LONG-TERM POSTF1LTER (block 71) 
This block is executed once a vector. 

Input: ST.B.GUKP 
Output: TEMP 

Function: Perform filtering operation of the long-terra postfilter. 

For Ksl, 2, . . . , IDIM, do the next line 

TSMP<K)=CLMST<K)*e*ST<K-KP)} | long-term postf iltering . 

For K^-NPWSZ-KPMAX^l -2.-1.0, do the next line 

ST(K)-ST<K*IDIM) , shi£t decoded speech buffer. 

SHORT-TERM POSTFILTER (block 72) 
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This block is executed once a vector right alter the execution of block 7 1 . 



10 



Input: AP. AZ, TO-TZ. STPFFER. STPFIIR. TEMP (output of block 71) 
Output: TEMP 

Function: Perform filtering operation of ihe short-term postfilter. 



15 



20 



25 



30 



For K=l,2 IDIM. do the following 

TMP = TEMP(K) 

For J=10,9 3,2. do che next 2 lines 

TEMP(K) = TEMP ( K) ~ STPFFIR(J) 'AZ(J^l) 
STPFFIR(J) = STPFFIR<J-1) 
TEMP(K) = TEMP(K) * STPFFIR ( 1 ) *AZ ( 2 ) 
STPFFIR(l) = TMP 

For J=10,9 3.2. do cne next 2 lines 

TEMP(K) = TEMP(K) - STPFIIR { J) *AP < J+l ) 
STPFIIR(J) = STPFIIR<J-1) 

TEMP(K) =: TEMP(K) - STPFIIR ( 1 ) *AP ( 2 ) 

STPFIIR(l) = TEMP(K) 
TEMP(K) = TEMP(K) * STPFIIR (2) *TILTZ 



I All-zero pare 
I of Che filter. 
I Last multiplier 



I All -pole part 
I of che filter. 
I Last multiplier 

I Spectral tile com 
I pensation filter. 



SUM OF ABSOLUTE VALUE CALCULATOR (block 73) 
This block is executed once a vector after execution of block 32. 



Input: ST 
35 Output SUMUNFIL 

Funcnon: Calculate the sum of absolute values of the components of the decoded speech 

vector. 



40 



SUMUNFIL *0 . 

FOR K=l,2, . . . , IDIM, do the next line 

SUMUNFIL s SUMUNFIL ♦ absolute value of ST(K) 



45 



SUM OF ABSOLUTE VALUE CALCULATOR (block 74) 

This block is executed once a vector after execution of block 72. 
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5 Input: TEMP (output of block 72) 

OuipuL SUMFIL 

Function: Calculate the sum of absolute values of the components of the short-term posullter 
output vector. 

10 

SUMFILsO. 

FOR K=l,2 IDIM. do the next line 

SUMFIL = SUMFIL ♦ absolute value of TQ<P<K) 

15 ~" 

SCALING FACTOR CALCULATOR (block 75) 

This block is executed once a vector after execution of blocks 73 and 74. 

20 

Input SUMUNFIU SUMFIL 
Output: SCALE 

Function: Calculate the overall scaling factor of the posullter 

25 

If SUMFIL > 1. set SCALE = SUMUNFIL /. SUMFIL; 
Otherwise, set SCALE = 1. 



30 

FIRST-ORDER LOWPASS FILTER (block 76) and OUTPUT GAIN SCALING UNIT (block 77) 

These two blocks are executed once a vector after execution of blocks 72 and 75. It is more 
convenient to describe the two blocks together. 

35 

Input SCALE, TEMP (output of block 72) 
Output SPF 

40 Function; Lowpass filter the once-a-vector scaling factor and use the filtered scaling factor to 

scale the short-term postfilter output vector. 

For K=l,2 IDIM. do the following 

SCALEFIL = AGCFAC*SCALEFIL ♦ < 1-AGCFAC) •SCALE I lovpass filtering 
45 SPF(K) = SCALEFIL *TEMP (K) I seal* output. 



OUTPUT PCM FORMAT CONVERSION (block 28) 

50 
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Input: SPF 
Output: SD 

Function: Convert the 5 components of the decoded speech vector into 5 corresponding A-law 
oru-law PCM samples and put them out sequentially at 125 jis time intervals. 

The conversion rules from uniform PCM to A-law or ji-law PCM are specified in 
Recommendation G.71 1. 
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ANNEX A 
(to Recommendation G.728) 

HYBRID WINDOW FUNCTIONS FOR VARIOUS LPC ANALYSES IN LD-CELP 



In the LD-CELP coder, we use three separate LPC analyses to update the coefficients of three 
filters: (1) the synthesis filter. (2) the log-gain predictor, and (3) the perceptual weighting filter 
Each of these three LPC analyses has its own hybrid window. For each hybrid window, we list the 
values of window function samples thai are used in the hybrid windowing calculation procedure. 
These window functions were first designed using floating-point arithmetic and then quantized to 
the numbers which can be exactly represented by 16-bit representations with 15 bits of fraction. 
For each window, we will first give a table containing the floating-point equivalent of the 16-bit 
numbers and then give a table with corresponding 16-bit integer representations. 

A.l Hybrid Window for the Synthesis Filter 

The following table contains the first 105 samples of the window function for the synthesis 
filter. The first 35 samples are the non-recursive portion, and the rest are the recursive portion. 
The table should be read from left to right from the first row. then left to right for the second row. 
and so on (just like the raster scan line). 



0.047760010 
0.282775879 
0.501739502 
0.692199707 
0.843322754 
0.946533203 
0.996002197 
0.988861084 
0.953948975 
0.920227031 
0.887725830 
0.856384277 
0.826141357 
0.796936035 
0.768798828 
0.741638184 
0.715454102 
0.690185547 
0.665802002 
0.642272949 
0.619598389 



0.095428467 
0.328277588 
0.542480469 
0.725891113 
0.868041992 
0.960876465 
0.999114990 
0.981781006 
0.947082320 
0.913635254 
0.881378174 
0.850250244 
0.820220947 
0.791229248 
0.763305664 
0.736328125 
0.710327148 
0.685241699 
0.66104126O 
0.63769531-3 
0.615142822 



0.142852783 

0.373016357 

0.582000732 

0.757904053 

0.890747070 

0.973022461 

0.999969482 

0.974731445" 

0.940307617 

0.907104492 

0.875061035 

0.844146729 

0.814331055 

0.785583496 

0.757812500 

0.731048584 

0.705230713 

0.680328369 

0.656280518 

0.633117676 

0.610748291 



0.189971924 
0.416900635 
0.620178223 
0.788208008 
0.911437988 
0.982910156 
0.998565674 
0.967742920 
0.933563232 
0.900604248 
0.868774414 
0.838104248 
0.808502197 
0.779937744 
0.752380371 
0.725830078 
0.700164795 
0.675445557 
0.651580811 
0.628570557 
0.606384277 



0.236663818 
0.459838867 
0.656921387 
0.816680908 
0.930053711 
0.990600586 
0.994842529 
0.960815430 
0.926879883 
0.894134521 
0.862548828 
0.832092285 
0.802703857 
0.774353027 
0.747009277 
0.720611572 
0.695159912 
0.670593262 
0.64691 1621 
0.624084473 
0.602020264 
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The next table contains the corresponding 1 6-bit integer representation. Dividing the table entries 
by 2 l 5 = 32768 gives the table above. 



1565 


3127 


4681 


6225 


7755 


9266 


10757 


12223 


13661 


15068 


16441 


17776 


19071 


20322 


21526 


22682 


23786 


24835 


25828 


26761 


27634 


28444 


29188 


29866 


30476 


31016 


31486 


31884 


32208 


32460 


32637 


32739 


32767 


32721 


32599 


32403 


32171 


31940 


31711 


31484 


31259 


31034 


30812 


30591 


30372 


30154 


29938 


29724 


29511 


29299 


29089 


28881 


28674 


28468 


28264 


28062 


27861 


27661 


27463 


27266 


27071 


26877 


26684 


26493 


26303 


26114 


25927 


25742 


25557 


25374 


25192 


25012 


24832 


24654 


24478 


24302 


24128 


23955 


23784 


23613 


73444 


23276 


23109 


22943 


22779 


22616 


22454 


22293 


22133 


21974 


21817 


21661 


21505 


21351 


21198 


21046 


20896 


20746 


20597 


20450 


20303 


20157 


20013 


19870 


19727 



A J. Hybrid Window for the Log-Gain Predictor 

The following table contains the first 34 samples of the window function for the log-gain 
predictor. The first 20 samples are the non-recursive portion, and the test are the recursive 
portion. The table should be read in the same manner as the two tables above. 



0.092346191 
0J26763916 
0.850585938 
0.995819092 
0.932006836 
0.778625488 
0.650482178 



0.183868408 
0.602996826 
0.895507813 
0.999969482 
0.899078369 
0.751129150 
0.627502441 



0.273834229 
0.674072266 
0.932769775 
0.995635986 
0.867309570 
0.724578857 
0.605346680 



0.361480713 
0.739379883 
0.962066650 
0.982757568 
0.836669922 
0.699005127 
0J83953857 



0.446014404 
0.798400879 
0.983154297 
0.961486816 
0.807128906 
0.6743 16406 



The next table contains the corresponding 16-bit integer representation. Dividing the table 
entries by 2« 3 = 32768 gives the table above. 
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3026 


6025 


8973 . 


11845 


14615 


17261 


19759 


22088 


24228 


26162 


27872 


29344 


30565 


31525 


32216 


32631 


32767 


32625 


32203 


31506 


3054O 


29461 


28420 


27416 


26448 


25514 


24613 


23743 


22905 


22096 


21315 


20562 


19836 


19135 





15 



20 



25 



30 



A3 Hybrid Window for the Perceptual Weighting Filter 

The following table contains the first 60 samples of the window function for the perceptual 
weighting filter. The first 30 samples are the non-recursive portion, and the rest are the recursive 
poruon. The table should be read in the same manner as the four tables above. 



0.059722900 
0.35 10 13 184 
0.611145020 
0.817108154 
0.950622559 
0.999847412 
0.960449219 
0.880737305 
0.807647705 
0.740600586 
0.679138184 
0.622772217 



0.119262695 
0.406311035' 
0.657348633 
0.850097656 
0.967468262 
0.999084473 
0.943939209 
0.865600586 
0.793762207 
0.727874756 
0.667480469 
0.612091064 



0.178375244 
0.460174561 
0.701171875 
0.880035400 
0.980865479 
0.994720459 
0.927734375 
0.850738525 
0.780120850 
0.715393066 
0.656005859 
0.601562500 



0.2368 16406 
0.512390137 
0.742523193 
0.906829834 
0.990722656 
0.986816406 
0.911804199 
0.836120605 
0.766723633 
0.703094482 
0.644744873 
0.591217041 



0.294433594 
0-562774658 
0.781219482 
0.930389404 
0.997070313 
0.975372314 
0.896148682 
0.821746826 
0.753570557 
0.691009521 
0.633666992 
0.581085205 



35 



The next table contains the corresponding 16-bit integer representation. Dividing the table 
entries by 2' 3 = 32768 gives the table above. 



1957 
11502 
20026 
26775 
31150 
32763 
31472 
28860 
26465 
24268 
22254 
20407 



3908 
13314 
21540 
27856 
31702 
32738 
30931 
28364 
26010 
23851 
21872 
20057 



5845 
15079 
22976 
28837 
32141 
32595 
30400 
27877 
25563 
23442 
21496 
19712 



7760 
16790 
24331 
29715 
32464 
32336 
29878 
27398 
25124 
23039 
21127 
19373 



9648 
18441 
25599 
30487 
32672 
31961 
29365 
26927 
24693 
22643 
20764 
19041 
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ANNEX B 
(to Recommendation G.728) 

EXCITATION SHAPE AND GAIN CODEBOOK TABLES 



This appendix first gives the 7-bit excitation VQ shape codebook table. Each row in the table 
specifies one of the 128 shape codcvcctors. The first column is the channel index associated with 
each shape codevecior (obtained by a Gray-code index assignment algorithm). The second 
through the sixth columns are the first through the fifth components of the 128 shape codcvcctors 
as represented in 16-bit fixed point. To obtain the floating point value from the integer value, 
divide the integer value by 2048. This is equivalent to multiplication by r 11 or shifting the binary 
point 1 1 bits to the left 



Channel 






Codevector 






Index 






Components 






0 


668 


-2950 


-1254 


-1790 


-2553 


1 


-5032 


-4577 


-1045 


2908 


3318 


2 


-2819 


-2677 


-948 


-2825 


-♦450 


3 


-6679 


-340 


1482 


-1276 


1262 


4 


-562 


-6757 


1281 


179 


-1274 


5 


-2512 


-7130 


-♦925 


6913 


2411 


6 


-2478 


-156 


4683 


-3873 


0 


7 


-8208 


2140 


-»78 


-2785 


533 


8 


1889 


2759 


1381 


-6955 


-5913 


9 


5082 


-2460 


-5778 


1797 


568 


10 


-2208 


-3309 


-4523 


-6236 


-7505 


11 


-2719 


4358 


-2988 


-1149 


2664 


12 


1259 


995 


2711 


-2464 


-10390 
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Next we give the values for the gain code book. This table not only includes the values for GQ, 
but also the values for GB. G2 and GSQ as well. Both GQ and GB can be represented exactly in 
16-bit arithmetic using Q13 format. The fixed point representation of G2 is just the same as GQ, 
except the format is now Q12. An approximate representation of GSQ co the nearest integer in 
fixed point Q 12 format will suffice. 



Array 
Index 


I 


2 


3 


4 


5 


6 


7 


8 


GQ" 


0.515625 


0.90234375 


1.579101563 


2.763427734 


-CKXD 


•GQ(2) 


-GQ<3) 


-GQ(4) 


GB 


0.708984375 


1.240722656 


2.171264649 


« 


-GB(l) 


-GB(2) 


-GB(3) 


■ 


G2 


1.03125 


1.8046875 


3.158203126 


5.526855468 


-G2<1) 


-G2(2) 


-G2(3) 


-G2(4) 


GSQ 


0.26586914 


0.814224243 


2-493561746 


7.636532841 


GSQ(l) 


GSQ(2) 


GSQ(3) 


GSQ<4) 



30 



35 



• Can be any arbitrary value (not used). 

Note that GQ( I) = 33/64, and GQ(iX7/4)GQ<i-l) for i=2J.4. 

Table 

Values of Gain Codebook Related Arrays 
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ANNEX C 
(to Recommendation G.728) 

VALUES USED FOR BANDWIDTH BROADENING 

The following table gives the integer values for thepole control, zero control and bandwidth 
broadening vectors listed in Table 2. To obtain the floating point value, divide the integer value 
by 16384. The values in this table represent these floating point values in the Q 14 foimat. the 
most commonly used format to represent numbers less than 2 in 16 bit fixed point arithmetic. 



15 



20 



25 



30 



35 



40 



45 



50 



l 

2 

3 
4 
5 
6 
7 
8 
9 
10 
ll 
12 
13 
14 
15 
16 
17 
18 
19 
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23 
24 
25 
26 
27 
28 
29 
30 
31 
32 
33 



FACV 
16384 
16192 
16002 
15815 
15629 
15446 
15265 
15086 
14910 
14735 
14562 
14391 
14223 
14056 
13891 
13729 
13568 
13409 
13252 
13096 
12943 
12791 
12641 
12493 
12347 
12202 
12059 
11918 
11778 
11640 
11504 
11369 
11236 
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35 I 0974 

36 I 0845 

37 I0718 

38 1 0593 

39 10468 

40 I 0346 

4 1 1 0225 

42 10105 

43 9986 

44 9869 

45 9754 

46 9639 

47 9526 

48 94 1 5 

49 9304 

50 9195 

51 9088 



ANNEX D 
(to Recommendation G.728) 

COEFFICIENTS OF THE 1 kHz LOWPASS ELLIPTIC FILTER 
USED IN PITCH PERIOD EXTRACTION MODULE (BLOCK 82) 



The 1 kHz lowpass filter used in the pitch lag extraction and encoding module (block 82) is a 
third-order pole-zero filter with a transfer function of 



L(2)= 3 _ 



where the coefficients o ( *s and 6/s are given in the following tables. 



j 






0 




0.0357081667 


1 


-2.34036589 


-0.0069956244 


2 


2.01190019 


-0.0069956244 


3 


-0.614109218 


0.0357081667 
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ANNEXE 
(to Recommendation G.728) 

TIME SCHEDULING THE SEQUENCE OF COMPUTATIONS 

All of the computation in the encoder and decoder can be divided up into two classes. 
Included in the first class are those computations which take place once per vector. Sections 3 
through 5.14 note which computations these are. Generally they are the ones which involve or 
lead to the actual quantization of the excitation signal and the synthesis of the output signal. 
Referring specifically to the block numbers in Fig. 2. this class includes blocks I. 2, 4, 9, 10. 11, 
13, 16, 17. 18, 21, and 22. In Fig. 3, this class includes blocks 28. 29, 31, 32 and 34. In Fig. 6. 
this class includes blocks 39. 40. 41. 42. 46. 47, 48, and 67. (Note thai Fig. 6 is applicable to both 
block 20 in Fig. 2 and block 30 in Fig. 3. Blocks 43. 44 and 45 of Fig. 6 are not part of this class. 
Thus, blocks 20 and 30 are part of both classes.) 

In the other class are those computations which are only done once for every four vectors. 
Once more referring to Figures 2 through 8. this class includes blocks 3. 12, 14, 15. 23, 33. 35. 36. 
37. 38. 43. 44. 45. 49. 50. 5 1. 8 1. 82. 83. 84. and 85. All of the computations in this second class 
are associated with updating one or more of the adaptive filters or predictors in the coder. In the 
encoder there are three such adaptive structures, the 50th order LPC synthesis filter, the vector 
gain predictor, and the perceptual weighting filter. In the decoder there are four such structures, the 
synthesis filter, the gain predictor, and the long term and short term adaptive postfilters. Included 
in the descriptions of sections 3 through 5. 14 are the times and input signals for each of these five 
adaptive structures. Although it is redundant, this appendix explicidy lists all of this timing 
information in one place for the convenience of the reader: The following table summarizes the 
five adaptive structures, their input signals, their times of computation and the time at which the 
updated values are first used. For reference, the fourth column in the table refers to the block 
numbers used in the figures and in sections 3. 4 and 5 as a cross reference to these computations. 

By far. the largest amount of computation is expended in updating the 50th order synthesis 
filter. The input signal required is the synthesis filter output speech (ST). As soon as the fourth 
vector in the previous cycle has been decoded, the hybrid window method for computing the 
autocorrelation coefficients can commence (block 49). When it is completed, Duxbtn's recursion 
to obtain the prediction coefficients can begin (Mock 50). In practice we found it necessary to 
stretch this computation over more than one vector cycle. We begin the hybrid window 
computation before vector 1 has been fully received. Before Durbin's recursion can be fully 
completed, we must interrupt it to encode vector L Durbin's recursion is not completed until 
vector 2. Finally bandwidth expansion (block 51) is applied to the predictor coefficients. The 
results of this calculation are not used until the encoding or decoding of vector 3 because in the 
encoder we need to combine these updated values with the update of the perceptual weighting 
filter and codevector energies. These updates are not available until vector 3. 

The gain adaptation precedes in two fashions. The adaptive predictor is updated once every 
four vectors. However, the adaptive predictor produces a new gain value once per vector. In this 
section we are describing the timing of the update of the predictor. To compute this requires first 
performing the hybrid window method on the previous log gains (block 43), then Durbin's 
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recursion (block 44). and bandwidth expansion (block 45). All of this can be completed during 
vector 2 using the log gains available up through vector I. If the result of Durbin's recursion 
indicates there is no singularity, then the new gain predictor is used immediately in the encoding 

of vector 2. 

The perceptual weighting filter update is computed during vector 3. The first part of this 
update is performing the LPC analysis on the input speech up through vector 2. We can begin this 
computation immediately after vector 2 has been encoded, not waiting for vector 3 to be fully 
received. This consists of performing the hybrid window method (block 36), Durbin's recursion 
(block 37) and the weighting filter coefficient calculations (block 38). Next we need to combine 
the perceptual weighting filter with the updated synthesis filter to compute the impulse response 
vector calculator (block 12). We also must convolve every shape codevector with this impulse 
response to find the codevector energies (blocks 14 and 15). As soon as these computations are 
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completed, we can immediately use all of the updated values in the encoding of vector 3. (Note: 
Because the computation of codevector energies is fairly intensive, we were unable to complete 
the perceptual weighting filter update as pan of the computation during the time of vector 2. even 
if the gain predictor update were moved elsewhere. This is why it was deferred to vector 3.) 

The long term adaptive postfilter is updaied on the basis of a fast pitch extraction algonthm 
which uses the synthesis Alter output speech (ST) for its inpuL Since the postfiiter is only used in 
the decoder, scheduling time to perform this computation was based on the other computational 
loads in the decoder: The decoder does not have to update the perceptual weighting filter and 
codevector energies, so the time slot of vector 3 is available. The codeword for vector 3 is 
decoded and its synthesis filter output speech is available together with all previous synthesis 
output vectors. These are input to the adapter which then produces the new pitch period (blocks 
81 and 82) and long-term postfilter coefficient (blocks 83 and 84). These new values are 
immediately used in calculating the postnltered output for vector 3. 

The short term adaptive postfilter is updated as a by-product of the synthesis filter update. 
Durbin's recursion is stopped at order 10 and the prediction coefficients are saved for the postfilter 
update. Since the Durbin computation is usually begun during vector 1, the short term adaptive 
postfilter update is completed in time for the postfiltering of output vector 1. 
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APPENDIX i 
(to Recommendation G.728) 

IMPLEMENTATION VERIFICATION 



A set of verification tools have been designed in order to facilitate the compliance verification 
of different implementations to the algorithm defined in this Recommendation. These verification 
tools are available from the ITU on a set of distribution diskettes. 
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[mpietnenuooa venficaboQ 



This Appendix describes the digital test sequences and the measurement software to be used for implementation 
vcrj'icaoon. These venficaaon tools are available from die fTU on a set of verification diskettes. 



10 1. 1 Vtrtficano* priAcipU 

The LD-CELP algomhm specification is formulated in a non-kmexact manner to allow for simple implementation 
on different kinds of hardware. This implies that the verification procedure can not assume the tmplementaaon under te*i 
10 be exactly equal to airy reference implementation. Hence, objective measurements are needed to establish ir* degree or 
deviaoon between ten and reference. If this measured deviation is found to be sufficiently small, the test implementation 
15 lS assumed to be interopcxabte «ndi any other implementation passing ine test. Since no finiie length test is capaote oi 

tesang every aspect of an uTtpAemeraanoci. 100% cenainry that an implementation is correct can never be guaranteed. Ho- 
wever, the test procedure described exercises all main pans of the LD-CELP algorithm and should be a valuable tool tor 
trie im pie mentor. 

The verification procedures described in this appendix have been designed with 32 bit floaring-poiiu implements- 
uons in mind. Although they could be appiied © any LD-CELP implementation. 32 bis fkiaong-pouu format wiU prooa&iv 
20 be needed to fulfill the test requirements. Venficaaon procedures thai could perm a a futed-potnt algorithm to be reame d 

are currently under study. 



f .2 T<si co*flguracu>mj 

This secnon describes how the different est sentences and measurement programs should be used together 
25 perform the verification teats- The procedure is baaed on btaefc-boa lesong at me interfaces SU and I CHAN of the test 

encoder and ICHAN and SPF of the test decoder. The signals SU and SPF are represented ui 16 bits fixed point precision 
as described in Secbon L4JL A pmhhiiiry o am off the adaptive pocEfilter shoukl be prowled in die decoder im- 
piemen unon. All test sequence pita i aaa g should be started «ixh the test unrjiernentaoon in the initial reset state, as defi- 
ned by the LD-CELP reaarnmendatioBL lira lUMauumm programs. CWCOMP. SNR and WSML are needed to per- 
form me teat output sequence evaluations. Tboat progra m! are hotter described in Section IJ. Descriptions of in* 
30 different test coorlg^xraaoiu o be used are found ui tm following subeccnons (L2.M-2.4). 



I.Z.I Encoder t*M 
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/ 2.2 Decoder ten 

The base operation of the decoder is jessed with the coruTgiirarion ;n Figure I-2/G.723. A codeword lcsi sequen- 
ce. CW. is applied to the rtrroder under iesi with the adapbve posifiltex turned off. The output signal is then compared to 
Lie rererence ocoput signal, OUT A, with ihe SNR program. 
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RGURE 1-2/G.728 
Decoder test coc/lguraboa (2) 
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I 2 J Perceptual **ei$*ruitjilter test 



The encoder perceptual weigh onf filter is tested with the configuration in Figure I-3/G.T28. An input signal test 
sequence, IN. is passed through the encoder under test and (he Qjuaiiry of die output codewords are measured wuh the 
WSNR program. The WSNR program also needs the input rqurnrr to compute the coma distance measure. 
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en/icouo* program 



This sccaon describes programs CWCOMP, SNR and WSNR. referred umtk test configu.-auon secucn ^ 
well is the program LDCDEC provided as an impiemeniors debugging cooi. 

The venficuion software is wriuen in Fanran and is kept as close to the ANSI Fortran 77 standard as possible. 
Double precision naanng point resoiuuon is used extensively us minimize numerical error in the reference LO-CEL? no- 
lo duics - ^ Programs have been compiled with a commercially available Fortran compder to produce executable versions 
for 38tVS7-based PC's. The READ. ME file in the disnbuooo describes how uj create executable programs on other -om- 
puicrs. 



UJ CWCOM? 



15 CWCOMP program is a sample tool to compere the content of cwo codeword files. The user is prompt for 

two codeword file names, the ref erence encoder output (filename in last column of Tabic MA3.728) and the test encoder 
ouipux. The program compares each codeword in these files and writes the comparison result to terminal. The requirement 
for icsz configuration 2 is that no different codewords should exist 



/J J SMR 

The SNR pro-am implements a signal -«o-nocse ratio measurement between two signal flies. The first is a refc 
rence file provided by the reference drrorlcr progam. and me second is the test decoder output file. A global 5 NR. GLOB, 
is computed as the total file signal-a>notse raoo A segmental SNR. SEG236. is computed as the average signal-to-noise 
raco of all IScVsampie segments with reference signal power above a certain threshold. Minimum segment SNRx arc 
found for aegmenta of kngm 256. 122, 64,32, 16, 8 and 4 win power above me same threshold. 

To run the SNR progr am , me user needs o cater names of two input files. The first is die reference decoder out- 
put Hie as described in the lax column of Table 1-3/0.724. The second is the decoded output file produced by the «w~w 
under test. After processing the files, the program outputs dm different SNRs to terminal. Requirement values for the test 
configurations 2 and 4 are given in terms of these SNR numbers. 
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35 



UJ wsx* 

The WSNR algorithm is based on a reference decoder and distance measure im piemen anon to compute the mean 
perceptually weighted distortion of a co d eword s eq u enc e . A kxgavunauc signal HCMlistcroon ratio is computed for every 
5 -sample signal vector, and the ranee are a v er a j e d over aU si*^ vecaors witit energy above a certain threshold. 



To run the WSNR program, die user needs 10 enter names of two input fues. The first is the encoder input signal 
file (first column of Table MA5.72f) and the second is the encoder output code w o r d Ate. After processing the sequence. 
WSNR wnaes die output WSNR value to nraineL The mquarmrni value for test cofsiujuraoon 3 is given in terms of this 
WSNR number. 



U.4 LDCDEC 
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' 4 ~czt sequences 

The follow** * a desenpeon of the tesx so,^ co be applied. T*< description .nciudes the specific rec a « 
mertis i or «L*:ft sequence. • 



to 



f5 



20 



25 



14 1 SamxAg convennoAs 

The tea sequence* are numbered sequentially, with * prefix thai identifies the type of signal: 

CN- encoder input signal 

CNCW: encoder output codewords 

CW: decoder input cod ew or ds 

OUT A: decoda output signal vuhout postfilter 

OUTB: decoder output signal wuh postfiiser 



All tea sequence files have (he c 
(A 2 File format* 



.BIN. 



The signal files, according to the LD-CELP interfaces SU and SPF (file prefix IN. OUT A and OUTB) are all in 
2 s complement 16 bu binary format and should be interpreted to have a noted binary point between bit »2 and *3 as 
shown in Figure I-5A3.72S. No* that ail the 16 available bits must be used to achieve maximum precision in the test rnca. 
sure men cx_ 

The codeword files (UKHJ signal ICHAN. file prefix CW or INCW), « sored m the same 16 bit binary 

f ^"^'TZ!* a ' ™* ****** »0 Wo of each 16 bu word represent the 10 bu codeword, as showTu, 
Figure I-5/G.72S. The other bra (•12-»15) sre sec to zero. 



Both signal and codeword fifes are stared in the 
VAX/VMS computers. For use on other platforms, such as 
by a byieswsp operaaon. 



firs word 
UNIX 



storage format thai u usual on rBM/DOS and 
this ordering may have to be changed 
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TABLE M/G.72S 
Encoder tests 



IflpUL 
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Lcngtft. 
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IX-Tcnpoon of test 


Test 
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TABLE I-3/G.72B 
Decoder tesa 

5 





Input 
Signal 


Length, 
vectors 


Deacnpcon of test 


Test 
conilg. 


Output 
signal 


10 
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ty implemented 
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TABLE l-</G.72S 
30 Decoder teat rwa 
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enficznon tools dun-Lbuao* 



1 .44 Mbyie 3.5* DOS diskettes. Diskcue copies can dc orCcrrc 



10 



15 



AJ1 the files in the disffibuoon are aored ji i 
front ine iTU u the following address: 

mi Geneni Sccrweriat 
Sales Service 
Place du Naoocs 
CH-12U Geneve 20 
5 wiuertand 

A README file is included on dukeae »l to describe the content of each fUe and the procedures necessarv so 
compile and Unit the programs. Extensions are used to separate different OJe types. '.FOR files are source code tor e* 
fomn programs, -£XE files are 38eVT7 esecucables and VSIN arc buwy test sequence files. The content of each diskei- 
te ts Uste4 in Tabte [-5/G. 728. 

TABLE N5/G.728 
Dtacrtfcaob* directory 



20 


Disk. 


Filename 


Number of bytes 




Diskette «l 


README 


10430 




TccaJ size: 

i 289 859 byes 


CWCOMP-FOR 


2642 
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Claims 



1 . A method of generating linear prediction filter coefficient signals during frame erasure, the generated lin- 
ear prediction coefficient signals for use by a linear prediction filter in synthesizing a speech signal, the 

5 method comprising the steps of: 

storing linear prediction coefficient signals in a memory, said linear prediction coefficient signals 
generated responsive to a speech signal corresponding to a non-erased frame; and 

responsive to a frame erasure, scaling one or more of said stored linear prediction coefficient sig- 
nals by a scale factor, BEF raised to an exponent i, where 0.95^BEF^0.99 and where i indexes the stored 
10 linear prediction coefficient signals, the scaled linear prediction coefficient signals applied to the linear 

prediction filter for use in synthesizing the speech signal. 

2. The method of claim 1 wherein BEF is substantially equal to 0.97. 

15 3. The method of claim 1 wherein BEF is substantially equal to 0.98. 

4. The method of claim 1 wherein the linear prediction filter comprises a 50th order linear prediction filter 
and said exponent indexes 50 linear prediction coefficient signals. 

20 5. The method of claim 1 wherein the linear prediction filter comprises a filter of an order greater than 20 
and said exponent indexes a number of linear prediction coefficient signals, the number equal to the order 
of the filter. 



25 



30 



35 



40 



6. The method of claim 1 wherein the step of scaling is performed once per erased frame. 

7. A method of synthesizing a signal reflecting human speech, the method for use by a decoder which ex- 
periences an erasure of input bits, the decoder including a first excitation signal generator responsive to 
said input bits and a synthesis filter responsive to an excitation signal, the method comprising the steps 
of: 

storing samples of a first excitation signal generated by said first excitation signal generator; 
responsive to a signal indicating the erasure of input bits, synthesizing a second excitation signal 
based on previously stored samples of the first excitation signal; and 

filtering said second excitation signal to synthesize said signal reflecting human speech; 
wherein the step of synthesizing a second excitation signal includes the steps of: 

correlating a first subset of samples stored in said memory with a second subset of samples stored 
in said memory, at least one of said samples in said second subset being earlier than any sample in said 
first subset; 

identifying a set of stored excitation signal samples based on a correlation of first and second sub- 
sets; 

forming said second excitation signal based on said identified set of excitation signal samples. 

8. The method of claim 7 wherein the step of forming said second excitation signal comprises copying said 
identified set of stored excitation signal samples for use as samples of said second excitation signal. 

45 9. The method of claim 7 wherein said identified set of stored excitation signal samples comprises five con- 
secutive stored samples. 

1 0. The method of claim 7 further comprising the step of storing samples of said second excitation signal in 
said memory. 

50 

1 1. The method of claim 7 further comprising the step of determining whether erased input bits likely represent 
non- voiced speech. 

1 2. The method of claim 7 wherein: 

55 the ste P of correlating comprises determining a time lag value between first and second subsets 

of samples corresponding to a maximum correlation: and 

the step of identifying a set of stored excitation signal samples comprises identifying said samples 
based on said time lag value. 

90 
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13. The method of claim 12 further comprising the steps of: 

in accordance with a test, determining whether erased input bits likely represent a signal of very 
low periodicity; and 

if erased input bits are determined to represent a signal of very low periodicity, modifying said time 
5 lag value. 

14. The method of claim 1 3 wherein said test comprises comparing a weight of a single tap pitch predictor to 
a threshold. 

10 15. The method of claim 13 wherein said test comprises comparing the maximum correlation to a threshold. 

16. The method of claim 13 wherein the step of modifying said time lag value comprises incrementing said 
time lag value. 
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ously stored excitation signal vectors generated during 
non -erased frames. This synthesis differs for voiced and 
non-voiced speech. During erased frames, linear pre- 
diction filter coefficients are synthesized as a weighted 
extrapolation of a set of linear prediction filter coeffi- 



cients determined during non-erased frames. The 
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putational complexity offsets additional computation re- 
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